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I.  INTRODUCTION 


1.1  Motivation  and  Discussion  of  Problem 

Reliability  and  availability  have  become  two  of  the  prime 
considerations  in  the  design  of  control  systems  for  a  diverse  group  of 
applications  that  includes  flight  control  systems  for  both  aircraft  and 
spacecraft.  Considerable  effort  is  now  being  devoted  to  the  design  of 
highly  reliable  control  system  components  and  to  the  design  of  fault- 
tolerant  processors  for  online  control  computations.  Despite  the  success  of 
some  of  these  efforts,  the  extremely  high  reliability  goals  that  are 
becoming  commonplace  in  the  Air  Force  and  elsewhere  can  often  be  met  only  by 
designing  control  systems  with  built-in  component  redundancy.  The 
combination  of  a  redundant  system  architecture  and  a  redundancy  management 
(RM)  algorithm  constitutes  a  fault-tolerant  system  design. 

Predicting  the  performance  of  these  designs  is  an  important  and 
difficult  problem.  The  performance  is  Judged  by  such  quantities  as  the 
reliability,  the  availability,  or  some  other  probabilistic  quantity  such  as 
average  measurement  accuracy  or  average  regulation  error.  Calculating  these 
quantities  is  an  important  problem  because  they  represent  the  criteria  by 
which  various  fault- tolerant  system  designs  are  Judged.  Such  calculations 
are  difficult  because  fault-tolerant  systems  are  subject  to  random  events, 
such  as  failures  and  RM  decisions,  that  change  the  nature  of  operation  of 
the  system  and  therefore  affect  the  values  of  the  performance  quantities. 

Several  papers  and  theses  have  introduced  the  concept  of  modelling  the 
random  behavior  of  a  fault-tolerant  system  by  generalized  finite-state 
Markov  models  [1-6].  The  states  in  these  models  characterize  the  status  of 
the  system  in  terms  of  the  number  of  components  that  are  operating,  the 
nianber  of  these  that  are  failed,  and  the  status  of  the  RM  decisions.  The 


transition  behavior  among  these  states  must  then  be  derived  from  the 
probabilistic  behavior  of  eomponent  failures  and  of  the  RM  decisions 
(including  errors  such  as  false  alarms  and  missed  alarms).  Once  this 
characterization  is  complete,  the  resulting  Markov  model  (or,  more 
generally,  semi-Markov  model)  can  be  used  to  derive  the  statistics  of  any 
relevant  quantity  that  is  dependent  upon  the  status  of  the  system.  Among 
these  are  the  reliability  and  availability  of  the  system,  but  the  statistics 
of  other  quantities  such  as  the  time  to  first  passage  of  a  particular  system 
status  or  a  performance  measure  dependent  on  the  system  state  history  can 
also  be  calculated. 

Despite  their  obvious  utility  for  fault- tolerant  system  performance 
analysis,  these  models  suffer  from  one  serious  drawback  that  has 
considerably  limited  their  use.  That  drawback  is  that  they  tend  to  be 
computationally  intractable  even  for  relatively  simple  fault- tolerant  system 
architectures.  This  Intractability  is  the  result  of  a  number  of  factors: 

1.  The  number  of  states  can  be  large,  particularly  for  complex  systems 
comprising  many  components.  Essentially,  there  are  as  many  states  in 
the  model  as  there  are  distinct  combinations  of  failed  and  unfailed 
components  and  RM  decision  statuses  for  which  the  system  remains 
operative.  Even  the  exploitation  of  symmetry  and  similar  component 
behavior  to  reduce  the  model  order  can  still  leave  a  very  large  number 
of  states  in  the  final  model. 

2.  The  transient  behavior,  not  the  steady  state  behavior,  is  of  primary 
interest.  Because  the  components  are  subject  to  failure,  the  steady 
state  for  nearly  all  fault- tolerant  systems  is  complete  failure.  Even 
when  recovery  of  components  is  possible,  the  steady  state  may  not 
become  established  until  more  time  has  elapsed  than  the  useful  lifetime 


of  the  system  (see  comment  4  below).  In  either  case,  the  transient 
behavior  becomes  the  behavior  of  interest  and  steady  state  analysis 
techniques  do  not  apply.  This  is  particularly  unfortunate  when  the 
model  is  semi-Markov  in  nature  because  the  transient  analysis  of  such 
processes  requires  the  evaluation  of  convolution  quantities  (integrals 
or  subs,  respectively,  for  continuous  or  discrete  time  models)  that 
require  massive  amounts  of  computer  memory  and  computation  time. 

3.  The  time  horizons  of  interest  are  often  very  long  in  absolute  terms, 
though  they  still  remain  short  relative  to  the  time  required  for  the 
process  to  reach  the  steady  state.  Typically,  a  fault-tolerant  system 
will  be  used  for  operating  Intervals  that  are  a  significant  fraction  of 
the  expected  lifetime  of  its  most  failure-prone  components.  This 
fraction  seldom  approaches  unity  because  the  redundancy  level  of  these 
components  required  to  satisfy  any  reasonable  specification  on  the 
system  reliability  would  drive  the  price  of  the  system  high  enough  to 
Justify  the  use  of  fewer,  more  reliable  (and  therefore  more  expensive) 
components.  On  the  other  hand,  extremely  short  operating  times  would 
yield  a  probability  of  failure  for  any  component  that  is  so  low  that 
the  extra  investment  in  fault- tolerance  would  not  be  Justified  by  the 
small  Increase  in  reliability.  In  light  of  2  above  then,  the  transient 
behavior  of  a  Markovian  process  must  be  examined  over  time  horizons  on 
the  order  of  the  mean  time  to  failure  of  the  most  failure-prone 
component.  Given  the  current  emphasis  on  the  manufacture  of  highly 
reliable  components,  these  time  horizons  can  be  extremely  long. 

4.  A  time  scale  separation  tends  to  exist  between  the  component  failure 
process  and  the  RM  decision  process.  Failures  tend  to  occur  only 
rarely  and  therefore  tend  to  have  large  time  durations  between  them. 


RM  decisions,  however,  oust  occur  quickly  following  a  failure  and  tend 
to  occur  very  rapidly  relative  to  failure  events.  This  means  that  the 
Markovian  oodel  of  the  behavior  of  the  system  status  exhibits  "fast" 
modes  and  "slow”  modes.  This  time  scale  separation  provides  the 
motivation  for  the  behavioral  decomposition  methods  that  are  currently 
being  Investigated  by  us  and  by  other  researchers  in  the  field. 

The  goal  of  this  research  project  is  to  develop  a  method  that  generates 
approximate  solutions  to  the  generalized  Markov  process  models  that 
characterize  fault-tolerant  system  behavior  without  the  use  of  excessive 
computer  memory  or  computation  time.  The  behavioral  decomposition  alluded 
to  in  Comment  4  above  provides  the  basis  for  the  approach.  However,  the 
nature  of  fault-tolerant  system  models  is  such  that  extensions  to  existing 
theory  are  necessary  in  order  to  exploit  the  decomposition  approach.  These 
extensions  and  the  numerical  verification  of  their  validity  are  the  primary 
results  obtained  from  the  work  reported  here. 

1 .2  Previous  and  Related  Work 

A  number  of  researchers  have  addressed  various  aspects  of  the  problem 
of  approximating  the  behavior  of  finite  state  Markov  processes  with  weak 
interactions  between  groups  of  states.  The  most  recent  work  to  appear  on 
this  subject  is  that  of  Coderch  [1J.  This  paper  is  derived  from  [2],  which 
contains  an  extensive  description  of  previous  work  in  the  area.  Much  of  the 
work  preceding  [1]  applied  only  to  limited  classes  of  finite  state  Markov 
processes  and,  in  particular,  were  not  applicable  to  semi-Markov  processes 
or  to  processes  with  purely  transient  states.  In  [1],  a  method  is  described 
by  which  continuous  time,  finite  state,  weakly  coupled  Markov  processes 
without  transient  states  can  be  decomposed  into  transition  operators  that 
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are  valid  for  Increasingly  longer  time  scales.  The  result  is  a  sequence  of 
operators  that  describe  the  transition  behavior  of  the  process  at  each  time 
scale  such  that  the  multiple  time  scale  solution  for  the  process  behavior 
converges  to  the  actual  process  behavior  asymptotically  as  the  small 
parameter  representing  the  weak  interactions  converges  to  zero. 
Unfortunately,  the  method  does  not  apply  to  semi-Markov  processes  and  it  has 
not  been  extended  to  apply  to  discrete  time  processes.  Furthermore,  the 
method  requires  the  solution  of  very  complex  linear  algebra  problems,  such 
as  the  description  of  nullspaces  of  operators,  in  the  generation  of  the 
operators  that  are  valid  at  each  time  scale. 

Currently,  an  effort  is  underway  to  extend  the  results  of  [1]  to  finite 
state  Markov  processes  evolving  in  both  discrete  and  continuous  time  that 
Include  special  types  of  transient  states  (called  "nonsplitting  transient 
states"  in  [4]).  Some  preliminary  results  of  this  effort  are  described  in 
C33.  Further  results  are  expected  soon  [4].  It  should  be  noted  that  the 
results  In  [3]  and  [4],  like  those  in  the  previously  cited  references, 
currently  are  applicable  only  to  Markov  processes.  It  is  expected  that  [4] 
will  include  some  results  on  semi-Markov  processes,  but  the  limitations  of 
these  results  remain  to  be  seen. 

It  should  also  be  noted  that  the  methods  of  [3]  and  [4],  like  those  in 
[1.2],  generate  a  description  of  the  behavior  of  the  process  in  sequentially 
longer  time  scales.  It  is  frequently  the  case  in  fault-tolerant  system 
analysis  that  the  behavior  of  interest  occurs  only  in  the  first  time  scale. 
This  observation,  combined  with  the  difficulty  that  the  methods  of  [33  and 
[4]  have  in  dealing  with  transient  states  and  the  current  lack  of  results 
for  semi-Markov  processes,  suggests  that  an  alternative  method  for  dealing 
with  these  processes  is  of  interest. 
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Much  of  the  work  reported  here  is  an  extension  of  the  work  reported  by 
Korolyuk,  et.  al.  [5,6],  These  results  apply  to  finite  state  semi-Markov 
processes  with  weak  interactions,  where  the  continuous  time  case  is  treated 
in  [5]  and  the  discrete  time  case  in  [6].  The  interactions  between  the 
states  is  weak  in  the  sense  that  the  transition  behavior  depends  upon  a 
small  parameter  e  such  that  when  e  is  zero  the  process  decomposes  into 
noninteracting  classes  of  states.  The  form  of  the  transition  behavior 
assumed  by  [5,6]  is  that  the  transition  probabilities  within  a  class  include 
terns  that  are  independent  of  e  while  the  Interclass  transition 
probabilities  are  all  at  least  first  order  in  e.  Also,  it  is  assumed  that 
the  holding  time  densities  associated  with  all  transitions  become  compressed 
near  the  origin  as  e  becomes  small.  Finally,  it  is  assumed  that  the 
decomposed  classes  that  result  from  setting  e-0  are  all  ergodic.  When  all 
of  these  conditions  are  satisfied,  it  is  shown  in  [5,6]  that  the  behavior  of 
the  original  process  over  time  horizons  on  the  order  of  t/e  can  be 
approximated  by  a  reduced  order  Markov  process  representing  the  interclass 
behavior  in  this  time  scale  expanded  by  the  stationary  distribution  of 
probability  within  a  class  that  results  from  the  ergodicity  of  each  class 
when  e  is  zero.  The  parameters  of  the  reduced  order  Markov  process  are 
expressed  in  terms  of  the  transition  probabilities  of  the  original  process 
and  the  mean  holding  times  associated  with  the  holding  time  distributions. 

The  results  in  [5,6]  are  very  powerful  for  approximating  the  behavior 
of  semi-Markov  processes  that  satisfy  all  of  the  conditions  in  the  first 
order  time  scale.  Unfortunately,  most  models  of  fault-tolerant  system 
behavior  do  not  satisfy  these  conditions.  This  observation  provides  the 
motivation  for  much  of  the  work  to  be  reported  here. 


In  particular,  fault-tolerant  system  models  tend  to  have  two 
characteristics  that  violate  the  conditions  imposed  on  the  process  by  [5,6]. 
One  Is  that  the  holding  time  densities  do  not  compress  as  the  small 
parameter  representing  the  weak  Interclass  interactions  is  made  smaller. 

The  reason  for  this  is  that  the  holding  time  densities  for  fault-tolerant 
system  models  are  determined  by  the  probability  mass  functions  of  the  time 
needed  for  various  sequential  fault  diagnosis  tests  to  reach  decisions.  The 
behavior  of  the  fault  diagnosis  tests  typically  occurs  in  the  "fast"  time 
scale,  but  it  is  not  altered  by  changes  in  the  failure  rate  of  the 
components,  which  is  usually  the  source  of  the  small  interaction  parameters 
in  these  models.  This  situation  is  illustrated  clearly  by  the  model  derived 
in  Chapter  3  of  [7],  which  is  the  9-state  model  referred  to  in  [8].  None  of 
the  holding  time  densities  for  this  model  display  the  explicit  dependence  on 
the  scaled  time  t/e  that  [5,6]  assume  (see  Appendix  C  of  [7]). 

The  other  manner  in  which  fault-tolerant  system  models  often  violate 
the  conditions  assumed  in  [5,6]  is  with  respect  to  the  ergodicity  of  the 
classes  when  e-0.  Many  fault- tolerant  systems  include  RM  logic  that  shuts 
off  a  component  permanently  once  it  has  been  diagnosed  as  failed.  If  this 
diagnosis  is  the  result  of  a  false  alarm,  the  corresponding  system  status 
state  involves  no  failures  and  hence  tends  to  be  in  the  same  class  upon 
decomposition  of  the  model  as  other  no-failure  states  such  as  the  state 
where  no  failures  and  no  RM  decisions  have  yet  taken  place.  But  the  false 
alarm  state  in  this  case  is  a  trapping  state  for  this  claws  when  the  failure 
probability  (and  hence  e)  is  set  to  zero.  Therefore,  this  class  is 
nonergodic.  This  tends  to  be  true  of  many  of  the  classes  of  states 
associated  with  models  of  fault- tolerant  system  behavior  when  irreversible 


RM  logic  is  used  by  the  system. 


The  work  that  was  reported  In  [83  last  year  discussed  some  of  the 
alternatives  that  were  being  Investigated  for  circumventing  the  problems 
associated  with  applying  the  results  of  [5.6]  to  fault  tolerant  system 
models.  In  [83 •  it  was  noted  that  the  ergodicity  of  the  classes  is  actually 
a  stronger  condition  than  what  is  sufficient  for  the  proofs  presented  in 
[5,6]  to  hold.  In  particular,  it  is  sufficient  that  the  inverse  operator 

[I  -  ♦  irk3  1  exist  where  and  *k  are  operators  that  are  associated  with 

the  kth  class  defined  in  [8,p.  73.  This  observation  leads  to  the 
interesting  but  not  very  useful  conclusion  that  the  results  of  [5,63  can  be 
extended  to  models  for  which  the  weaker  condition  is  satisfied  by  each 
class. 

It  was  also  reported  in  [83  that  work  had  begun  on  circumventing  the 
problem  that  the  holding  time  densities  for  fault  tolerant  system  models  are 
not  dependent  on  the  small  parameter  representing  the  weak  interactions. 

The  approach  described  in  [83  was  to  introduce  a  second  small  parameter  that 
represented  time  scaling  into  the  model.  The  holding  time  densities  then 
took  the  appropriate  form  for  application  of  the  results  of  [5,63  provided 
the  time  scaling  parameter  was  proportional  to  the  original  small 
interaction  parameter.  It  was  speculated  that  the  time-scaled  results  would 
exhibit  the  asymptotic  convergence  to  the  correct  behavior  implied  by  the 
results  of  [5,63.  Work  had  just  begun  on  Investigating  this  speculative 
hypothesis  for  continuous  time  models. 

1 .3  Research  Goals  for  the  Year 

The  goals  for  the  year  of  effort  reported  here  were  as  follows*. 


1.  Continue  the  extension  of  the  results  of  [5,6]  to  models  evolving  in 
continuous  time  where  the  holding  time  densities  do  not  depend  directly 
upon  the  small  interaction  parameter  e  but  rather  on  a  small  time 
scaling  parameter  related  to  e. 

2.  Conduct  further  investigations  on  nonergodic  models  by  examining  a 
number  of  continuous  time  examples.  Attempt  to  identify  a  theoretical 
result  regarding  such  models. 

3.  Develop  results  similar  to  [5,6]  as  extended  by  the  two  previous  goals 
for  discrete  time  semi-Markov  models  of  fault  tolerant  systems. 

4.  Develop  a  means  for  generating  the  exact  solution  to  models  of  simple 
fault-tolerant  systems  for  the  purpose  of  comparison  with  the  results 
generated  by  the  approximate  technique. 

The  next  section  of  this  report  will  discuss  the  progress  made  on  these 
goals  during  the  past  year. 

II.  PROGRESS  SUMMARY 

In  this  section,  the  work  of  the  past  year  is  summarized  and  is  related 
to  the  goals  that  were  discussed  above.  Numerous  references  are  made  to 
[7],  which  is  the  S.M.  thesis  of  Siu-Kwong  Chu  that  was  completed  under  the 
support  of  this  grant.  This  thesis  is  included  as  Appendix  A  of  this  report 
for  easy  reference. 

2. t  Time-scaling  of  Continuous  Time  Models 

In  [8],  the  idea  was  put  forward  that  when  the  time  axis  over  which  a 
semi-Markov  model  of  fault-tolerant  system  behavior  evolves  is  scaled  by  a 
small  parameter  5,  the  holding  time  densities  in  the  model  take  the  form 
that  is  required  for  the  application  of  the  asymptotic  theorems  of  [5,6] 


provided  the  parameter  6  is  proportional  to  e.  This  idea  is  explained 
rigorously  in  section  2.2.1  of  [7].  After  introducing  this  time  scaling,  it 
is  possible  to  rederive  the  results  that  are  of  interest  for  asymptotic 
approximations  to  the  behavior  of  these  semi-Markov  models. 

Let  E  be  the  state  space  of  a  finite  state  semi-Markov  process  that 
evolves  in  continuous  time  t.  Suppose  that  the  process  is  observed  with 
respect  to  the  scaled  time  t/6.  Suppose  further  that  the  transition 
operator  of  the  process  is  such  that  its  (J,i)  element  representing 
transitions  from  state  i  to  state  j  has  the  form: 

vt,/{)  * E 

where  t'  represents  scaled  time  and  where  the  eventual  transition 
probabilities  p^  take  the  form: 

p'J1  -  «  ,<*>  1.J  e  Ek 


£  V  1 E  V  J  £  Ek 

Here  it  is  assumed  that  the  state  space  E  decomposes  into  weakly  interacting 

(k ) 

classes  {E^f  E 2 . En).  It  is  also  assumed  that  the  ^  for  each  E^  sum 

to  unity,  hence  when  e-0  the  classes  E^  become  noninteracting  and  each 
describes  a  valid  semi-Markov  process. 

Now  let  be  the  sojourn  time  (in  scaled  time)  of  the  process  in 

class  E.  when  it  begins  from  state  ieE  and  transits  to  class  E  .  Let 

K  K  P 


♦^(s)  denote  the  characteristic  function  of  Then,  if  the  for 

each  k  represent  the  transition  probabilities  of  an  ergodlc  Markov  chain, 
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then  the  <p^\s)  are  independent  of  the  superscript  i  and  they  take  the 


form: 


.00 


ieE, 


♦rk(a) 


jeE. 


(k) 


Tk) 


ieE, 


l  ,«  .  a  n(k)  c  (kk 
JeEk  (I  aji  PJi  qji  } 


V* 

prk  A.  /a  +  s 
k 

(k) 

where  the  are  the  stationary  probabilities  of  the  ergodic  semi-Markov 

process  associated  with  class  Ek  and  the  a^  are  the  mean  holding  times 

associated  with  the  F..(t)  in  the  original  time  scale.  The  quantities  in 
#  J  * 

the  second  expression  above  are  defined  in  [7,  sec.  2.2.2].  Note  that  this 
expression  takes  the  form  of  the  characteristic  function  of  a  Markov  process 
transition  operator  with  eventual  transition  probability  prk  and  transition 

rate  time  constant  A^/a.  Thus,  the  interclass  transitions  are  Markovian  in 

scaled  time. 

The  proof  of  this  result  can  be  found  in  [7,  sec.  2.2.2]. 

The  derivation  of  the  result  expressed  above  makes  possible  the 
analysis  of  continuous  time  semi-Markov  models  of  fault  tolerant  system 
behavior  provided  the  model  has  ergodic  classes  (note  the  underlined 
condition  above).  Many  fault  tolerant  system  models  violate  this  condition, 
as  was  discussed  in  the  Introduction.  However,  many  fault-tolerant  systems 
that  do  not  employ  Irreversible  fault  isolation  logic  do  produce  models  with 
ergodic  classes.  Therefore,  this  result  is  a  positive  step  toward  analysis 


of  models  for  these  types  of  systems. 


The  manner  in  which  the  result  above  can  be  used  for  such  analyses  is 
as  follows.  Suppose  a  model  for  a  fault  tolerant  system  has  been 
constructed  and  one  is  interested  in  calculating  the  state  probabilities  for 
the  model  at  some  relatively  large  value  of  time  t  in  order  to  assess  the 
reliability  (or  some  other  status-related  property)  of  the  system.  Suppose 
further  that  the  model  satisfies  the  conditions  stated  in  the  result  above. 
Then  the  approximate  class  occupancy  probabilities  at  the  desired  time  can 
be  calculated  by  scaling  time  appropriately,  constructing  the  Markov  process 
that  approximately  governs  interclass  behavior  from  the  result  above  (this 
is  called  the  enlarged  process  in  [7])  and  solving  this  relatively  easy 
Markov  process  problem.  It  is  assumed  here  that  the  initial  condition  is 
known  for  the  state  probabilities  and  therefore  also  for  the  class  occupancy 
probabilities.  The  results  should  be  rescaled  back  to  the  original  time 
scale.  Then,  finally,  the  approximate  state  probabilities  can  be  evaluated 
by  weighting  the  stationary  probability  distribution  associated  with  each 
class  when  e-0  by  the  appropriate  approximate  class  occupancy  probability. 

The  derivation  of  the  result  above  and  the  construction  of  the 
approximate  evaluation  method  discussed  in  the  preceding  paragraph  complete 
the  work  necessary  to  satisfy  Goal  1 . 

To  illustrate  the  approximate  evaluation  procedure,  a  model  for  a 
generic  fault  tolerant  system  was  constructed  and  solved  using  both  "brute 
force"  numerical  convolution  techniques  and  the  approximate  technique 
described  above.  The  system  consisted  of  three  components  where  at  least 
one  unfailed  component  must  be  available  for  the  system  to  remain  operating. 
It  was  assumed  that  the  failure  diagnosis  algorithm  used  sequential  tests  in 
combination  with  logic  that  is  described  in  detail  in  sec.  3.1  of  [7].  The 
tests  were  assumed  to  have  second  order  Erlang  distributions  for  their  times 


to  decision.  The  logic  included  the  possibility  of  recovering  components 
that  have  previously  been  diagnosed  as  failed,  thereby  leading  to  a  model 
that  has  ergodlc  classes.  The  complete  model  is  described  in  secs.  3.3 
through  3.5  and  Appendix  C  of  [7].  The  model  has  9  states  which  decompose 
into  three  classes  when  the  small  failure  rate  is  set  to  zero. 

The  exact  state  probability  histories  are  obtained  numerically  and  are 
described  in  chapter  4  of  [7].  It  should  be  noted  that  a  very  large  amount 
of  computational  effort  was  required  to  generate  these  exact  solutions.  The 
approximate  model  is  also  constructed  and  solved  in  chapter  4  of  [7].  The 
approximate  solutions  were,  for  the  most  part,  obtained  with  just  the  aid  of 
a  hand  calculator.  Only  when  complete  time  histories  were  desired  was  it 
necessary  to  resort  to  the  use  of  a  computer.  Upon  comparison  of  the 
results,  one  finds  that  the  largest  error  in  the  evaluation  of  any  of  the 
state  probabilities  by  the  approximate  method  for  this  example  is  less  than 
1J  of  the  value  obtained  by  numerical  means  (which  itself  is  subject  to  a 
small  amount  of  error)  for  times  greater  than  the  longest  mean  holding  time 
of  the  sequential  tests,  where  the  assumed  mean  time  between  failures  is  3 
orders  of  magnitude  longer  than  this. 

These  results  are  very  encouraging,  but  they  are  not  sufficient  to 
conclude  that  the  approximate  technique  always  works  so  well.  In  order  to 
further  investigate  the  properties  of  the  approximate  technique  with  the 
time  scaling  included,  a  number  of  four-state  semi-Markov  models  were 
examined.  These  models  were  chosen  to  reflect  various  characteristics  that 
larger  fault  tolerant  system  models  tend  to  possess.  By  keeping  the 
dimension  at  4,  however,  it  is  possible  to  generate  the  true  behavior  of  the 
model  with  relative  ease  whereas  models  of  larger  dimension  are  extremely 
difficult  to  solve  (recall  the  comments  above  regarding  the  nine-state 


model).  Even  four-state  models  are  difficult  enough  to  solve,  however,  that 


symbolic  manipulation  was  necessary  to  generate  the  exact  solutions.  This 
is  true  despite  the  fact  that  none  of  the  holding  time  densities  in  the 
models  were  assumed  to  be  any  more  difficult  than  second  order  Erlang. 

The  five  cases  of  four-state  models  that  were  examined  are  discussed  in 
detail  in  chapter  5  of  [7].  The  approximate  method  produced  very  accurate 
results  in  every  case  that  was  examined.  The  comparison  between  the  results 
was  almost  always  exact  to  4  decimal  places  except  in  the  very  early  time 
periods  before  the  startup  transient  of  the  process  has  decayed. 

One  of  the  cases  of  four-state  models  that  was  examined  was  a  model 
that  did  not  have  ergodic  classes  (Case  IV).  The  fact  that  the  approximate 
technique  still  produced  extremely  accurate  results  suggested  that  we 
investigate  further  the  ergodicity  condition  and  its  impact  on  the  results 
from  which  the  approximate  method  is  derived.  The  work  accomplished  in  this 
area  is  described  in  the  next  section. 

2.2  Relaxation  of  Ergodicity  Condition 

Many  fault  tolerant  systems  yield  generalized  Markovian  models  of  their 
behavior  that  decompose  into  classes  that  satisfy  all  of  the  conditions  for 
applying  the  approximate  technique  except  the  condition  that  they  be  ergodic 
when  c-0.  This  is  typically  the  result  of  irreversible  logic  structures  in 
the  RM  algorithm  for  the  system  such  that  diagnostic  decisions  alone  can 
permanently  eliminate  a  component  from  use. 

However,  in  the  analysis  of  four-state  models  discussed  above,  it  was 
noted  that  excellent  results  were  obtained  when  the  approximate  method  was 
applied  to  a  case  where  the  model  did  not  possess  ergodic  classes.  A  single 
example  is  not  sufficient  to  prove  any  statement  regarding  the  applicability 


of  the  approximate  method  to  models  with  nonergodic  classes.  However,  these 
results  did  motivate  us  to  examine  the  underlying  reason  that  the  method 
worked  for  this  particular  example. 

The  result  of  this  investigation  is  the  following  theorem  regarding 
models  with  nonergodic  classes: 

Theorem  1 :  Let  a  semi-Markov  process  depend  upon  e  such  that  it  can  be 
decomposed  in  the  manner  described  in  section  2.1.  Suppose  in  addition 
that  the  imbedded  Markov  process  transition  operator  Pk  associated  with 

the  kth  class  when  e-0  satisfies: 

lim  n  I  P™  -  [v  v  •••  vj 
n+«  m-1 

where  v  is  a  constant  vector,  for  every  k.  Then  the  interclass 
transition  behavior  approaches  the  same  enlarged  Markov  process  behavior 
that  was  described  in  section  2.1  as  c  approaches  zero. 

The  proof  of  this  theorem  appears  in  chapter  6  of  [7]. 

Theorem  1  considerably  widens  the  class  of  semi-Markov  models  to  which 
the  approximate  technique  can  be  applied  because  the  condition  stated  in  the 
theorem  is  weaker  than  the  ergodicity  of  the  classes  that  was  required  by 
the  previous  results.  Many  fault  tolerant  system  models  possess  the 
properties  stated  in  the  conditions  of  Theorem  1. 

The  analysis  leading  to  Theorem  1  led  us  to  consider  the  specific 
situations  in  which  the  conditions  of  the  theorem  are  satisfied.  This 
investigation  led  to  the  following  refinement  of  the  theorem: 


Theorem  2:  Let  a  semi-Markov  process  depend  on  e  such  that  it  can  be 


decomposed  into  classes  as  prescribed  in  section  2.1.  The  transition 
operator  of  the  imbedded  Markov  process  associated  with  the  kth  class 
when  e-0  will  satisfy  the  condition  of  Theorem  1  if: 

1.  The  kth  class  is  ergodic,  or 

2.  Pk  has  one  and  only  one  eigenvalue  of  unity. 

The  proof  of  this  theorem  also  appears  in  chapter  6  of  [7]. 

It  should  be  emphasized  that  Theorem  1  is  still  only  a  sufficient 
condition  for  the  approximate  technique  to  yield  accurate  results  as  the 
small  parameter  e  becomes  small.  In  other  words,  there  may  exist  semi- 
Markov  models  that  do  not  satisfy  these  conditions  whose  behavior  can  still 
be  approximated  well  by  the  approximate  method.  Theorem  2  provides  a  more 
restrictive  but  more  easily  checked  sufficient  condition. 

Some  examples  of  models  that  do  and  do  not  satisfy  the  sufficient 
conditions  of  Theorem  1  are  presented  in  chapter  6  of  [7].  One  example  in 
particular  that  does  not  satisfy  the  conditions  includes  a  class  that 
contains  multiple  trapping  states  when  e-0.  We  have  begun  an  effort  to 
extend  the  results  to  this  case  as  well  by  searching  for  conditions  under 
which  the  approximate  method  succeeds  in  approximating  the  interclass 
behavior. 

The  derivation  of  the  two  theorems  discussed  above  represents  our 
progress  thusfar  on  Goal  2. 

2. 3  Discrete  Time  Models 

All  of  the  results  described  so  far  in  this  report  have  applied  to 
continuous  time  models  of  fault  tolerant  system  behavior.  However,  because 
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the  RM  algorithm  for  the  system  is  usually  implemented  on  a  digital  computer 

t 

with  a  significant  time  delay  between  successive  applications  of  the 
diagnosis  tests,  fault  tolerant  system  models  are  often  purely  discrete  time 
in  nature.  Efforts  have  been  made  during  the  past  year  to  derive  results 
for  discrete  time  processes  that  mimic  those  discussed  above  for  continuous 
time  processes.  This  section  reports  on  these  efforts. 

Much  of  the  work  that  has  been  accomplished  this  year  for  discrete  time 
models  has  related  to  the  adaptation  of  Korolyuk's  limit  theorem  for  semi- 
Markov  processes  [5]  to  semi^Markov  chains.  In  addition,  a  limit  theorem 
with  time  scaling  for  semi-Markov  chains  was  also  developed.  The  theorem 
statements  are  summarized  below. 

An  important  result  that  will  be  referred  to  in  both  theorems  discussed 
is  presented  in  Lemma  3. 


(k)  (k) 

LEMMA  3*  Let  P  -  [p^  ]  represent  an  imbedded  Markov  chain  operator  of 
a  semi  Markov  chain  E^.  Consider  the  system  of  equations  below: 

On  • 0 

JeEk 

The  solution  of  the  system  of  equations  is  independent  of  the 
superscript,  that  is: 

W>  ‘  ¥  1eEk 


if  and  only  if  the  imbedded  Markov  chain  operator  represented  by  the 


(k ) 

transition  probability  matrix  {p^  |  i , J eEk }  has  at  most  a  single  unit 
magnitude  eigenvalue. 


Thus,  any  ergodlc  imbedded  Markov  chain  operator  (for  which  all 
eigenvalues  have  less  than  unit  magnitude)  will  satisfy  Lemma  1.  In 


r 


addition,  any  monodesmlc  imbedded  Markov  chain  operator  (one  that  has  only 
one  trapping  or  absorbing  state,  and  hence  a  single  unit  magnitude 
eigenvalue)  will  also  satisfy  Lemma  1.  This  assertion  is  similar  to  Theorem 
2  for  continuous  time  models. 

The  following  theorem  describes  how  a  semi-Markov  chain  which  is 
dependent  on  a  small  parameter  e  can  be  approximately  described  by  a  Markov 
chain.  This  theorem  is  derived  based  on  the  results  for  semi-Markov 
processes  in  [5]. 

Seml^Markov  chains  are  characterized  by  a  finite  set  of  states  and  by  a 
distribution  of  the  holding  time  or  sojourn  time  in  each  state  that  is 
arbitrary  for  each  state  to  which  a  transition  can  occur.  A  semi-Markov 
chain  specializes  to  a  Markov  chain  when  the  holding  times  for  each  state 
are  Identically  exponentially  distributed.  The  semi-Markov  chains  here  are 
assumed  to  depend  on  a  small  parameter  e  such  that  the  state  space  can  be 
decomposed  into  disjoint  classes  of  states  where  the  probabilities  of 
departure  from  each  class  tend  to  zero  along  with  e.  In  addition,  the  total 
sojourn  in  each  class  is  assumed  to  have  a  non-degenerate  distribution  in 
the  limit  as  e  ♦  0. 

THEOREM  A:  A  Limit  Theorem  for  Semi-Markov  Chains 

Let  the  set  E  of  states  of  the  semi-Markov  chain  be  expressible  as  a 
sum  of  disjoint  classes 

N® 

E  -  £  E  k  c  (M  |  k  -  1,2, ...M)  (2.1) 

k-1  K 

Let  be  the  sojourn  of  the  semi-Markov  chain  in  class  E^  when  it 

starts  from  state  i  and  moves  to  class  Er>  The  following  two 


conditions  are  assumed  to  hold 


1 


The  elements  of  the  core  matrix  sequence  {gjj (m) | i ,JeE}  specifying 
the  semi-Markov  chain  depend  as  follows  on  the  small  parameter  e: 

<«>  •  pji  Vf>  (2-2> 

and  where  hj^O)  ■  0.  The  p^  may  be  expanded  in  a  Taylor  series 

about  e  -  0.  Taking  only  linear  terms  in  e: 
e 


(2.3) 


pji  •  pjf  - £  ^ . 0(£)l  1-J  £  Ek 

-e  q]p)  •  ...  ♦  0(e),  1  e  t„  ,  J  <  E, 

The  imbedded  Markov  chain  obeys  the  usual  Markov  chain  properties: 

l  p^  -  1;  and  p^}  e  CO,  1 3;  V  i,J  e  E.  ;  V  k  e  M  (2.H) 
JeEk  Ji 

and 

2.  The  imbedded  Markov  chain  defined  by  the  transition  probability 
matrices  {pj^|i,JEek  V  keM)  are  ergodic  with  stationary 
(k) 


probabilities  lieEk  v  kEM). 


Then: 


lim  PrfY^  <  t}  -  Ypk  [1  -  exp(-Akt/T)] 
£♦0 


where: 


(2.5) 


rk 


l  (k)  (kr) 
icEk  i  qi 

y  .  (k)  (k)  ’ 

ieEk 


V  '  j„mh Jl,m)  • 

Although  the  above  theorem  Is  useful,  it  is  not  directly 
applicable  to  most  fault  tolerant  system  models  for  two  reasons:  (1) 
the  imbedded  Markov  chains  for  such  models  are  usually  non-ergodic,  and 
(2)  the  holding  time  density  functions  are  usually  not  dependent  on  m/e 
but  only  on  m.  Hence,  a  necessary  adjustment  that  must  be  made  in  the 
above  theorem  is  to  determine  what  conditions  must  be  satisfied  by  the 
imbedded  Markov  chain  (thus  Lemma  3)  and  to  incorporate  time  scaling 
into  Theorem  4. 


THEOREM  5:  A  Limit  Theorem  With  Time  Scaling  for  Semi-Markov  Chains 

Let  the  set  E  of  states  of  the  semi-Markov  process  be  expressible 
as  a  sum  of  disjoint  classes 

N® 

E  -  l  E  1  e  (M  |  k  •  1 ,2,...Ne} 

k-1 
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Let  be  the  sojourn  of  the  semi-Markov  chain  in  class  Ek  when  it 

starts  from  state  i  and  moves  to  class  E^.  Let  the  following  two 
conditions  hold  for  the  semi-Markov  chain: 

1.  The  elements  of  the  core  matrix  sequence  {g^  (m)  |  i  ,JeE}  specifying 
the  semi-Markov  chain  depend  as  follows  on  the  small  parameter  6: 

®ji  <”>  ■  >ji  v!>  (2-2) 

and  where  h^CO)  “  0.  The  p^  may  be  expanded  in  a  Taylor  series 
about  e  -  0.  Taking  only  linear  terms  in  e: 

p!i  •  pj?’ ' £  ^  *  -  * 0(E)!  1>J  e  Ek 


(2.3) 


-c  <£>  *  ...  0<£ ) i  1  c  Ek  ;  J  i  Ek 

The  imbedded  Markov  chain  obeys  the  usual  Markov  chain  properties: 

I  pj^  -  1;  and  Pj(^  e  [0,1];  V  ij  e  Ek;  V  k  e  M  (2.4) 
JeEk 

and 

2.  The  imbedded  Markov  chains  defined  by  the  transition  probability 
(k) 

matrices  {p^  |i,jeEk  V  keM}  have  at  most  a  single  unit  magnitude 

eigenvalue  (hence,  ergodic  or  monodesmic)  with  stationary 
(k} 

probabilities  {irj  |ieEk  v  keM}. 

Then: 


11m  Pr{Y  <  t}  -  Ypk  [1  -  exp(-Akt/aT)] 
£♦0 


(2.5) 


fk)  fk) 

where  Y  ,  ,  A.  ,  q.  ,  q.  ,  and  a.  ,  were  all  defined  in  Theorem  4  and 

rk  k'  Mi  i  i 


a  is  defined  below 


The  results  of  Theorem  5  are  being  applied  to  examples  of  fault 
tolerant  control  systems  for  which  semi-Markov  chain  reliability  models  have 
been  derived.  Three  simple  reliability  models  have  been  developed  to  date. 
The  first  is  for  a  simple  component  monitoring  system.  A  single  non- 
essential  component  has  a  sequential  test  monitoring  faults  for  the 
information  of  the  pilot.  This  produces  a  3~state  model  that  can  be 
decomposed  into  two  classes.  The  second  model  is  of  a  single- component  dual 
redundant  (SCDR)  system.  This  model  has  six  states  and  three  non-ergodic 
classes.  When  a  false  alarm  recovery  test  is  incorporated  into  the  second 
system,  a  model  with  nine  states  and  three  ergodic  classes  results. 

These  three  models  will  be  analyzed  by  applying  the  results  of  Theorem 
5.  The  probabilities  of  occupying  each  class  will  be  computed  and  will  be 
compared  to  a  numerical  or  analytical  computation  of  the  3ame  quantities. 

This  work  and  its  continuation  represents  our  progress  so  far  on 
Goal  3. 

2. ^  Generation  of  Exact  Results 

When  approximate  answers  are  derived  to  problems  for  which  it  is 
difficult  or  impossible  to  generate  the  exact  answer,  a  question  arises 
regarding  the  means  by  which  these  approximate  answers  can  be  validated. 
Obviously,  it  is  the  Intent  of  the  problem-solver  to  avoid  the  difficult 
procedure  of  generating  an  exact  answer.  Yet,  without  the  exact  answer,  how 
can  one  be  certain  that  the  approximate  answer  is  accurate?  We  face  that 


dilemma  here  in  calculating  our  approximate  answers  to  fault  tolerant  system 


model  behavior. 

In  section  2.1,  we  limited  our  consideration  of  model  structures  to 
four-state  models  with  Erlangian  holding  times  so  that  we  could  generate  the 
exact  answers  relatively  easily.  In  fact,  as  we  discussed  in  section  2.1, 
it  was  still  necessary  to  use  a  symbolic  manipulation  program  to  derive  the 
true  results  because  the  numerical  calculations  were  cumbersome. 

Discrete  time  models  of  fault  tolerant  system  behavior  tend  to  be  just 
as  cumbersome.  In  selecting  the  three  models  of  fault  tolerant  system 
behavior  to  analyze,  we  have  been  careful  to  choose  simple  ones.  This 
allows  us  to  analyze  their  behavior  analytically  before  applying  the 
approximate  technique. 

In  this  regard,  our  efforts  have  been  directed  toward  using  a  symbolic 
manipulation  package  (MACSYMA)  to  obtain,  in  closed  form,  the  z-transform 
solution  to  the  discrete  time  models  (that  is,  an  expression  for  the  state 
occupancy  probability  vector).  From  the  analytical  solution,  a  truncated 
Taylor  series  expression  in  e  can  be  found  that  can  be  compared  with  the 
results  of  applying  Theorem  5.  This  will  provide  an  expression  for  the 
first  truncated  term  of  the  Taylor  series  and  thus  will  provide  an  error 
bound  on  the  approximation  for  these  models. 

2 

In  the  proof  of  Theorem  5,  all  order  e  terms  in  the  total  probability 
equation  are  ignored.  The  resulting  expression  contains  a  zero  and  first 
order  e  term.  The  zero  order  term  is  shown  to  vanish  in  the  limit  as  t 
approaches  zero.  The  remaining  first  order  e  term  is  left  and  e  may  be 
cancelled,  leaving  the  Theorem  5  result.  However,  a  first  order 
perturbation  of  the  Theorem  3  result  can  be  obtained  by  expanding  the  total 

3 

probability  equation  to  second  order  in  e  and  ignoring  order  e  terms. 
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Again,  the  zero  order  term  vanishes  in  the  limit.  With  the  remaining  terms, 
$rk(z)  is  found,  but  the  new  expression  contains  terms  proportional  to  e. 

Including  this  perturbation  term  in  e  should  improve  the  numerical  results 
that  can  be  obtained  for  the  class  occupancy  probabilities.  This  will  be 
discussed  in  future  progress  reports. 

This  constitutes  the  progress  we  have  made  so  far  on  Goal  4. 

III.  PAPERS  AND  PRESENTATIONS 

No  papers  were  derived  from  this  work  during  this  year.  However,  a 
paper  is  in  progress  based  upon  the  work  reported  in  sections  2.1  and  2.2 
that  will  be  submitted  to  an  archival  journal,  probably  Mathematics  of 
Operations  Research.  Also,  one  S.M.  thesis  was  completed  this  year,  namely 
that  of  Siu-Kwong  Chu.  This  thesis  [7]  is  included  here  as  Appendix  A. 

A  presentation  on  this  work  and  other  fault  tolerant  system  evaluation 
work  was  given  by  Prof.  Walker  at  NASA-Langley  Research  Center  In  March.  In 
addition,  Prof.  Walker  has  been  invited  to  speak  as  part  of  an  aerospace 
systems  workshop  at  the  American  Control  Conference  in  Seattle  in  June. 

IV.  PROJECTIONS  FOR  THIRD  YEAR  OF  WORK 

During  the  third  year  of  work,  the  goals  of  the  program  are  those  that 
were  stated  in  the  renewal  proposal.  These  are: 

1.  Investigate  the  possible  further  weakening  of  the  conditions  sufficient 
for  the  validity  of  the  approximate  results  for  continuous  time  models 
beyond  the  Theorems  of  section  2.2.  Our  primary  emphasis  here  will  be 
continuous  time  models  for  which  at  least  one  of  the  classes  of  the 
nonperturbed  process  contains  more  than  one  trapping  state. 


2.  Continue  the  derivation  of  analogous  results  for  purely  discrete 
parameter  models. 

3.  Complete  the  symbolic  derivation  of  analytical  solutions  for  the  three 
models  described  in  section  2.4.  Use  the  results  to  find  either  an 
alternative  form  for  the  discrete  time  approximate  results  or  an  error 
bound  in  terms  of  e  on  the  approximate  results.  Generalize  the  error 
bound,  if  possible. 

4.  Use  the  sampled  Monte  Carlo  techniques  of  [9]  to  generate  valid  "truth" 
results  with  which  the  approximate  results  can  be  compared. 

V.  FINANCIAL  AND  MANPOWER  STATUS 

The  manpower  complement  remained  unchanged  from  the  proposal. 

Professor  Bruce  K.  Walker  continues  as  the  Project  Director,  devoting 
approximately  20%  of  his  academic  year  time  and  60%  of  his  summer  time  to 
the  project.  The  two  graduate  students,  Siu-Kwong  Chu  and  Norman  M. 

Wereley,  continue  as  full-time  graduate  Research  Assistants  supported  by  the 
project.  Margaret  McCabe  provides  clerical  assistance.  No  changes  are 
anticipated  from  the  manpower  arrangement  proposed  in  the  renewal  proposal. 

The  financial  aspects  of  the  project  have  also  followed  the  proposal 
closely  with  one  exception.  The  cost  underrun  from  the  first  year  was  added 
to  the  second  year  budget,  partly  as  capital  equipment  funds.  Air  Force 
approval  was  given  for  this  change  by  Capt.  Dwight  McGhee  in  a  letter  dated 
11  December  1985.  The  capital  equipment  money  was  used  to  purchase  an  IBM 
Personal  Computer  Model  AT,  which  is  now  the  primary  means  of  computation 
and  word processing  for  all  three  participants  in  the  grant. 
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Abstract 


Problems  associated  with  the  evaluating  state  probability  histories  of  large  state 
space  models  of  fault- tolerant  system  are  explained,  and  it  appears  that  Korolyuk's 
Limit  Theorem  for  semi-Markov  processes  may  be  a  solution  to  these  problems  that 
approximates  the  aggregated  original  semi-Markov  process  by  a  reduced  order 
Markov  process.  The  Theorem  is  modified  and  extended  to  apply  to  approximate 
fault-tolerant  system  models  in  a  new  time  scale.  The  approximate  technique  is 
then  developed  by  expanding  the  approximate  Markov  process  state  probability 
histories  with  the  stationary  probability  distributions  associated  with  the 
aggregated  groups  of  states  of  the  original  semi-Markov  process.  The  technique  is 
demonstrated  with  a  realistic  9-state  model  and  five  4-state  models  which  mimic 
the  class  to  class  transition  structure  of  typical  fault-tolerant  system  models,  and 
the  results  show  that  accurate  approximation  is  achieved  for  these  examples  after  a 
short  transient  period.  In  addition,  the  ergodicity  sufficient  condition  imposed  on 
the  semi-Markov  process  to  be  approximated  is  relaxed.  As  a  result  fault-tolerant 
system  models  with  certain  types  of  non-ergodic  classes  can  also  be  solved  by  the 
approximate  technique. 
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Notation 


FjiO 


pEi() 
p;f(  > 


>W(.) 


state  space  of  semi-Markov  process 

k-th  partition  or  class  of  state  space  of  a  semi-Markov  process 

cumulative  probability  density  function  for  time  to  transition  from 
state  i  to  state  j 

holding  time  probability  density  function  for  transitions  from  state 
i  to  state  j 

holding  time  probability  density  function  matrix 

eventual  transition  probability  from  state  i  to  state  j  of  perturbed 
semi-Markov  process 

eventual  transition  probability  from  state  i  to  state  j  in  class  k  of 
non-perturbed  semi-Markov  process 

eventual  transition  probability  from  aggregated  "state”  k  to 
aggregated  "state”  r  of  the  approximate  Markov  process 

eventual  transition  probability  matrix 

transition  kernel  matrix 

total  probability  in  class  i  of  perturbed  semi-Markov  process 

kernel  element  for  transition  from  state  i  to  state  j  of  perturbed 
semi-Markov  process 

waiving  time  greater  than  (■) 
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5  time  scaling  factor 

€  small  parameter  representing  the  constant  failure  rate 

^rk(  )  transition  kernel  for  class  k  to  class  r  transitions  of  the 

approximate  Markov  process 

#(•)  interval  transition  probability  matrix 

Xq  parameter  for  false  alarm  decision  time  probability  density  function 

Xj  parameter  for  isolation  decision  time  probability  density  function 

XW0Awi  parameters  for  failed/unfailed  indication  decision  time  probability 
density  function  of  the  self-test  given  the  component  is  working 

Xpo/Xpi  parameters  for  failed/unfailed  indication  decision  time  probability 

density  function  of  the  self-test  given  the  component  is  failed 

constant  transition  rate  out  of  aggregated  ’’state”  k  of  the 
appproximate  Markov  process 

jTj(  )  probability  in  state  i  of  semi-Markov  process 

tt^{ •)  total  probability  in  class  k  of  the  original  semi-Markov  process 

jt*()  probability  in  state  k  of  the  approximate  Markov  process  (or 

enlarged  process)  i.e.,  approximate  total  probability  for  class  k  of 
the  original  semi-Markov  process 

jrjk^  stationary  probability  in  state  i  which  belongs  to  class  k  of  the  non- 

perturbed  semi-Markov  process 

stationary  probability  in  state  i  of  the  imbedded  non-perturbed 
'  Markov  process  for  class  k  of  the  semi-Markov  process 
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f ..  mean  holding  time  of  transition  from  state  i  to  state  j 

f.  mean  holding  time  in  state  i  without  regard  to  the  destination 

f  mean  holding  time  of  a  semi-Markov  process 

^rk  fche  sojourn  random  variable  of  the  semi-Markov  process  in  Ek 
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Chapter  1 
Introduction 


1.1  Background 

A  fault-tolerant  control  system  is  a  system  designed  with  redundant  capacity 
to  perform  its  mission.  That  is,  it  can  do  its  job  using  more  than  one  configuration 
of  its  components,  e.g.  sensors  and  actuators  and  information  processing 
capability.  The  on-line  detection  and  isolation  of  failed  components  and  the 
reconfiguration  of  the  system’s  architecture  is  performed  by  the  system’s 
Redundancy  Management  (RM)  scheme.  The  fault- tolerant  approach  enhances 
system  reliability  and  performance.  There  are  many  application  areas  where  ultra- 
high  system  reliability  is  necessary  or  desirable.  One  such  area  is  the  control  of 
nuclear  power  plants  where  the  consequences  of  improper  control  system  behavior 
may  be  serious  indeed.  There  are  space  missions  for  which  the  desired  operational 
lifetime  of  the  spacecraft  is  many  years.  The  air  traffic  control  system  and  many 
military  systems  are  also  subject  to  very  high  reliability  requirements.  There  is 
also  a  desire  for  increased  reliability  in  computerized  banking  systems,  chemical 
process  control  systems,  medical  monitoring  systems,  transportation  systems,  and 
many  more.  As  a  result,  growing  attention  is  being  given  to  the  design  of 
components  for  long  life,  to  quality  control  during  manufacture,  and  testing  and 
maintenance  policies  which  enhance  reliable  system  operation.  Despite  these  efforts 
to  improve  the  reliability  of  individual  components,  the  resulting  system  reliability 
is  still  often  inadequate  for  some  reliability  requirements.  As  a  result,  there  is 
increasing  interest  in  fault-tolerant  system  designs  which  allow  components  to  fail 
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but  still  provide  a  means  for  the  system  to  continue  to  function. 

The  growing  use  of  fault-tolerant  system  designs  has  in  turn  spurred  interest 
in  methods  for  assessing  the  reliability  and  performance  of  such  systems.  The 
traditional  methods  of  reliability  evaluation  are  based  on  combinatorial  analysis  of 
combinations  of  component  failures  {7}  .  They  generally  consider  only  the 
probabilistic  occurrences  of  component  failures  and  seldom  account  for  the 
probabilistic  nature  of  the  outcomes  of  any  on-line  monitoring  test  that  might  be 
used  by  the  fault-tolerant  system  in  an  effort  to  detect  and  identify  such  failures 
and  to  reconfigure  the  system  to  remove  from  use  any  failed  components.  In 
addition,  classical  reliability  analysis  produces  as  its  sole  result  the  probability  that 
the  system  will  maintain  its  integrity  over  the  duration  of  its  operating  time.  No 
information  is  provided  on  the  performance  of  the  system  during  the  transient 
period  of  the  mission. 

Since  classical  reliability  analysis  fails  to  quantify  fault-tolerant  system  time 
behavior,  other  alternatives  must  be  considered.  Naturally,  in  this  age  of  the  high- 
power  main-frame  computer,  Monte  Carlo  simulation  is  one  option.  This  method 
consists  of  building,  with  a  computer  program,  a  probabilistic  model  of  the  system 
under  investigation.  If  the  system  of  interest  is  properly  modeled  for  various 
random  effects  that  bear  on  it  and  sufficient  simulation  runs  are  obtained,  then 
essentially  any  aspect  of  the  system  performance  can  be  statistically  evaluated  from 
the  simulations.  However,  as  is  pointed  out  in  [10),  the  drawback  of  Monte  Carlo 
technique  stems  from  the  fact  that  a  sufficient  number  of  simulations  must  be 
available.  For  a  system  with  a  component  failure  rate  as  low  as  10'?  per  sec.,  the 
number  of  simulations  needed  to  generate  statistically  significant  results  about 
failures  must  exceed  one  billion.  Furthermore,  the  fault-tolerant  system  to  be 
simulated  is  frequently  rather  complex,  often  involving  multiple  instruments  and  a 
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hierarchical  architecture  for  the  Failure  Detection  and  Isolation  (FDI)  logic. 
Consequently,  obtaining  reliable  results  by  Monte  Carlo  technique  is  often 
prohibitively  costly  in  terms  of  the  required  computational  effort. 

The  use  of  Markov  chain  theory  [9,  6]  has  shown  promise  as  a  means  for 
evaluating  the  performance  of  those  fault-tolerant  systems  which  employ  FDI  tests 
that  are  of  the  single  sample  variety,  that  is,  the  information  that  is  used  for  FDI  is 
gathered  and  discarded  at  each  time  sample.  However,  single  sample  FDI  tests 
generally  have  a  relatively  high  likelihood  of  decision  errors,  particularly  in  noisy 
signal  environments.  In  such  situations,  fault-tolerant  systems  are  always  equipped 
with  digital  computers  that  execute  FDI  tests  based  on  several  samples  of  the 
monitoring  data  at  each  time  sample.  Such  tests  include  moving  window  tests  and 
tests  of  a  completely  sequential  nature.  Such  tests  are  not  memoryless.  Therefore, 
the  systems  in  which  they  are  employed  are  not  conducive  to  the  compact 
treatment  by  the  application  of  Markov  chain  analysis  that  is  possible  for  systems 
employing  only  single  sample  tests. 

The  Markov  modeling  technique  mentioned  in  the  previous  paragraph  must 
be  generalized  in  order  to  capture  the  non  -memoryless  nature  of  the  sequential  RM 
strategy  employed  in  many  fault-tolerant  systems.  More  specifically,  the  model 
must  account  for  the  time  delays  associated  with  processing  a  sequence  of 
observations  before  a  FDI  decision  is  made.  Some  effort  has  been  made  to  analyze 

i 

such  systems  and  it  appears  that  the  generalized  Markovian  (or  semi-Markov) 
modeling  methods  [10,  8]  are  applicable  to  some  systems  of  this  type.  In  addition 
to  the  necessary  assumptions,  a  problem  with  this  reliability  evaluation  method  is 
that  the  large  number  of  states  in  the  model  causes  the  computation  of  results  to 
involve  excessive  amounts  of  computer  storage  and  computation  time.  (Usually, 
each  state  in  a  generalized  finite-state  semi-Markovian  model  of  fault-tolerant 
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system  behavior  represents  a  particular  combination  of  specific  component  failure 
modes  and  of  RM  decisions.)  The  reason  for  this  for  both  continuous  time  and 
discrete  time  models  is  as  follows:  For  quantitative  continuous  time  system 
performance  analysis,  the  state  probability  distribution  jr(t)  at  every  time  t  must  be 
evaluated.  With  known  jt(0),  standard  time-invariant  semi-Markov  theory  yields, 

(i.i) 

where  $(t),  interval  transition  probability  matrix,  is  the  solution  of  the  following 
matrix  convolution  integral  equation  (see  appendix  A.2  for  the  details  of  the 
derivation,  and  the  notations), 

tf(0-  >  W[t)+  /V*(f-r)(P°ff(r)],  *(0)  =  /  (1.2) 

Jo 

The  above  equation  is  in  a  form  that  can  be  solved  analytically  by  the  Laplace 
transform  technique.  It  is  not  difficult  to  obtain  #(t)  in  closed  form  for  systems 
that  comprise  only  two  or  three  states.  However,  for  complex  systems  with  a  large 
number  of  states  (for  example,  the  model  for  a  dual-redundant  engine  controller  has 
30  states  (2)  flight  control  system  models  will  have  many  more),  it  will  become 

t 

intractable  to  obtain  a  closed  form  solution  even  with  the  help*  of  symbolic 
manipulation  software,  e.g.  MACSYMA.  The  reason  is  as  follows:  Solving  Eq.  (1.2) 
for  #(t)  involves  the  problem  of  inverting  an  N  x  N  matrix  symbolically,  where  N  is 
the  number  of  states  of  the  system  model.  Unlike  the  case  in  numerical  analysis 
where  the  number  of  operations  required  for  a  matrix  inversion  is  on  the  order  of 
N3,  in  symbolic  inversion  the  number  of  operations  for  a  N  x  N  matrix  whose 
elements  are  as  simple  as  a  single  term  function  of  s  is  on  the  order  of  N!.  It  should 
also  be  pointed  out  that  a  symbolic  operation  is  also  more  complicated  than  its 


counterpart  in  numerical  operations,  which  is  usually  a  floating  point 
multiplication.  In  addition,  the  computer  memory  required  for  storing  intermediate 
expressions  is  extremely  large.  So  the  problems  associated  with  memory  storage 
and  computation  time  prohibit  the  use  of  a  symbolic  manipulation  program  in 
solving  for  $(t)  analytically  for  a  continuous  time  model.  On  the  other  hand,  for 
discrete-time  semi-Markov  models,  jrk,  the  state  probability  distribution  at  time 
step  k  with  known  x(0),  can  be  expressed  as, 

*<*)  *  mm  (i.3) 

where  $(k)  is  recursively  generated  by  (see  appendix  A.l  for  the  details  of 
derivation  and  notations), 

k 

$(*)  =  >W[k) +'£  $(k—m)[PoH[m)\,  $(0)  =  /  (1.4) 

m— 0 

It  can  be  seen  that  a  convolution  sum  is  involved.  This  implies  thatl  for  a  system 
with  N  states,  approximately  2kN2  values  must  be  stored  in  order  to  compute  $(k) 
and  hence  Jr(k).  For  N  =  20  and  k  =  100,000  as  might  be  the  case  for  a  simple 
flight  control  system  operating  with  RM  updates  at  a  rate  of  50Hz  for  35  minutes, 
the  storage  required  is  approximately  80x10®  values  or  640  megabytes  of  storage  for 
accurate  single  precision  state  probability  distribution  calculations.  The  number  of 
floating  point  multiplications  required  for  calculating  $(100,000)  is  approximately 
7xl012.  This  poses  the  same  problem  as  the  continuous  time  model.  These 
computational  and  memory  burden  problems  encountered  in  the  reliability  and 
performance  analysis  of  complex  fault-tolerant  systems  employing  non-memoryless 
FDI  tests  provides  the  motivation  for  the  work  described  in  this  thesis. 

The  goal  of  this  work  is  to  reduce  the  problems  encountered  in  complex 


system  reliability  analysis  by  expanding  upon  the  asymptotic  approximation 
technique  for  semi-Markov  processes  described  in  [4,  5]  and  applying  them  to  fault- 
tolerant  system  models.  The  idea,  basically,  is  as  follows:  Consider  a  time-invariant 
finite-state  continuous  parameter  semi-Markov  process  whose  state  probability 
distribution  is  given  by  *(t)  for  t  >  0  with  ir{ 0)  known.  Then  x( t)  can  be  evaluated 
according  to  Eq.  (1.1)  and  (1.2).  Suppose  the  process  depends  on  a  small  parameter 
€  such  that  the  state  space  of  the  process  can  be  partitioned  into  disjoint  classes 
Ej,  ....  Em  when  <  *  0.  That  is,  no  classes  can  communicate  with  any  of  the  other 
classes  when  e  is  zero.  Suppose  further  that  the  Probability  Density  Functions 
(PDFs)  that  govern  the  transitions  between  states  also  depend  on  e  in  the  "right” 
form  (as  will  be  explained  in  Chapter  2)  and  let  x^t)  be  the  probability  distribution 
associated  with  this  aggregated  grouping  of  states.  Then  it  can  be  shown  [11]  that 


x*(t)  evolves  according  to  the  Kolmororov  backward  equations  governing  a  time- 

invariant  Markov  process,  that  x*(f)  =  lim  -E  x- (t/e)  and  that  the 

£  0  k 

parameters  defining  the  Markov  process  can  be  derived  from  that  of  the  original 


semi-Markov  process.  In  less  rigorous  terms,  this  means  that  the  long-term 
behavior  of  the  original  model,  that  is  the  distribution  xW(t/€)  after  it  is 
aggregated,  is  asymptotically  well-approximated  by  the  distribution  x*kj(t)  which 
evolves  as  a  Markov  process  with  known  transition  behavior  as  the  small  parameter 
€  nears  zero.  If  a  stationary  probability  distribution  x^  exists  for  each  disjoint 
class  of  states  Ek,  then  the  approximation  for  the  probability  in  state  i  is  [11], 

(0  »  (<0  (1-5) 


As  can  be  seen,  the  approximate  technique  involves  two  elements,  namely  the 
stationary  probability  distribution  and  the  approximate  Markov  process  (or  enlarged 
process).  These  results  are  also  applicable  to  discrete-time  time-invariant  finite 
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state  semi-Markov  models  [12]. 

The  application  of  these  results  to  the  reduction  of  the  complexity  of  the 
reliability  evaluations  based  upon  generalized  Markovian  models  is  reasonably 
straight-forward  if  the  system  model  has  all  the  characteristics  mentioned  above. 
Most  fault-tolerant  systems  produce  generalized  Markovian  models  that  are 
approximately  in  the  form  necessary  to  apply  these  results  because  the  rarity  of  the 
component  failure  events  relative  to  the  rate  at  which  RM  decisions  are  typically 
made  yields  the  small  parameter  <  which  must  be  present  in  the  characterization. 
A  problem  typically  arises  with  the  form  of  the  state  to  state  transition  holding 
time  density  functions,  this  problem,  however,  will  be  dealt  with  in  this  thesis. 


1.2  Organisation  of  Thesis 

The  mathematical  tool  that  is  used  to  model  fault-tolerant  systems  is  the 
theory  of  semi-Markov  processes.  They  are  very  similar  to  Markov  processes  but 
with  one  more  degree  of  freedom  that  make  them  well  suited  for  capturing  the 
random  delay  behavior  of  RM  decisions  for  nonmemoryless  tests.  Asymptotic 
enlarging  of  semi-Markov  processes  [4,  5]  is  the  primary  tool  that  is  used  to 
accomplish  the  goal  of  this  thesis.  However,  general  fault-tolerant  systems  yield 
semi-Markov  models  whose  state  to  state  transitions  do  not  behave  the  same  as 
that  described  in  the  references  there.  Therefore,  the  theory  will  be  extended  here 
to  apply  to  typical  fault-tolerant  system  models  and  the  parameters  for  the 
resulting  approximate  Markov  processes  will  be  derived  in  Chapter  2. 

In  Chapter  3,  the  structure  of  an  example  fault-tolerant  system  is  described 
and  the  assumptions  used  in  the  model  construction  for  it  are  stated.  .After 
defining  all  the  system  states,  a  9-state  transition  kernel  matrix  is  constructed 


which  completely  characterizes  the  system  behavior.  It  is  shown  that  the  system 
model  can  be  decomposed  into  three  classes  of  states  when  the  component  failure 
rate  is  equal  to  zero.  The  transition  kernel  is  then  decomposed  into  the  standard 
form  that  will  be  used  in  the  subsequent  chapter  to  calculate  the  parameters  of  the 
approximate  Markov  process. 

Chapter  4  deals  with  the  analysis  of  accuracy  of  the  two  elements  of  the 
approximation  technique.  That  is,  the  evolution  of  the  aggregated  state  probability 
distribution  calculated  by  the  semi-Markov  approach  is  compared  with  state 
probability  distribution  of  the  enlarged  process  and  the  normalized  probability 
distribution  is  compared  with  the  stationary  probability  distribution  in  each  class. 

The  enlarged  process  approximation  method  is  further  tested  in  Chapter  5 
with  a  general  4-state  semi-Markov  model.  Five  different  cases  are  presented  which 
capture  five  different  possible  class  to  class  transition  types  that  might  typically 
occur  in  a  fault-tolerant  system  model. 

The  sufficient  condition  imposed  on  the  semi-Markov  processes  for  the 
approximate  technique  to  be  applied  is  relaxed  and  two  theorems  associated  with 
this  relaxation  are  established  in  Chapter  6. 

Some  limitations  of  the  enlarged  process  approximation  approach  are 
examined  in  Chapter  7. 

Chapter  8  concludes  the  thesis  with  a  discussion  of  the  work  and  its 
contributions  and  suggestions  for  the  directions  that  further  research  might  take. 


Chapter  2 

Theory  of  Enlarged  Semi-Markov  Processes 


As  it  is  pointed  out  in  the  Introduction,  the  mathematical  tools  used  in  this 
thesis  are  classical  semi-Markov  process  theory  and  the  theory  of  enlarged  semi- 
Markov  processes.  Semi-Markov  process  theory  is  used  to  model  the  probabilistic 
behavior  of  a  fault-tolerant  system.  The  resulting  mathematical  model  of  a 
complicated  fault-tolerant  system  with  a  large  number  of  components  and  several 
different  levels  of  RM  decisions  is  a  high  dimensional  model  with  a  large  transition 
kernel  matrix.  Usually,  it  is  impractical  to  obtain  the  desired  state  probability 
distribution  history  over  the  mission  length  due  to  limited  computer  memory 
storage  and  the  high  computational  cost.  The  enlarged  semi-Markov  process 
theory,  to  be  described  in  Section  2.1,  is  used  to  approximate  the  large  dimension 
semi-Markov  process  by  a  low  dimension  Markov  process,  which  characterizes  the 
evolution  of  probability  among  groups  of  states.  That  is,  each  state  of  the  enlarged 
process  represents  a  group  of  states  of  the  original  semi-Markov  process. 
Frequently  in  fault-tolerant  system  models,  each  enlarged  process  state  represents  a 
group  of  states  from  the  original  model  with  the  same  number  of  working 
components  but  having  different  RM  configurations.  However,  enlarged  semi- 
Markov  process  theory  as  it  appears  in  the  current  literature  does  not  apply  to 
fault-tolerant  system  models,  as  will  be  explained  in  Section  2.2.  Therefore,  the 
theory  will  be  extended  in  Section  2.2.2  in  order  to  apply  it  to  fault-tolerant  system 
models. 
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2.1  Koroiyuk's  Limit  Theorem  for  Semi-Markov  Processes 

References  [3,4]  describe  the  conditions  under  which  a  perturbed  semi-Markov 
process  can  be  approximated  over  long  time  frames  by  a  Markov  chain.  There  are 
essentially  two  of  these  conditions.  First,  the  kernel  of  the  semi-Markov  process 
must  depend  on  a  small  positive  parameter  i  in  such  a  way  that  the  entire  space  of 
states  of  the  semi-Markov  process  E  can  be  split  into  disjoint  classes  of  states 
E  a*  Ek’  W^ere  ^e  probabilities  of  departure  from  each  class  and  of  the 

sojourn  time  in  a  given  state  both  tend  to  zero  with  e  .  The  total  sojourn  time  in 
each  class  is  assumed  to  have  a  nondegenerate  distribution  in  the  limit  as  c  -*•  0 
(when  £=0,  the  process  will  be  referred  to  the  non-perturbed  semi-Markov  process 
while  the  original  process  will  be  referred  to  as  the  perturbed  semi-Markov  process). 
Mathematically  this  condition  can  be  expressed  by  the  following  equations, 


/*,  (<)  =  P‘„  i.jeE;  (2.1) 

f  —  <9^  i.j  €  Ek, 

/,  =  *  '■  *  (2-2) 
'  iSEk,iiEt. 

where  ^  =  1,  i  E  Ek,  1  <  k  <  m. 

>*Ek 

where  pjj  is  the  eventual  transition  probability  of  the  original  process  from  state  i 
to  state  j,  Fjj(t/«)  is  the  Cumulative  Distribution  Function  (CDF)  of  the  holding 
time  for  transitions  from  state  i  to  state  j. 


Second,  the  Markov  chains  defined  by  the  transition  probability  matrices 

p^( i,j  €  E.,  1  <  k  <  m)  ,  must  be  ergodic  with  stationary  probability 

*  j  K 

distributions  (i  €  Ek,  1  <  k  <  m)  When  these  conditions  are  satisfied  by  a 
perturbed  semi-Markov  process,  then  its  behavior  can  be  approximated  by  a 
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Markov  chain.  More  specifically,  if  is  the  sojourn  of  the  semi-Markov  process  in 
class  when  it  begins  from  state  i  and  moves  to  class  Ef  ,  then  [4]  shows  that  the 
cumulative  distribution  function  of  the  random  variable  can  be  expressed  by  an 
exponential  function  when  t  becomes  vanishingly  small: 

>im  P{$  <<>  =  Pr*(  1  -rh<)  (2.3) 

e  -*  0 

As  can  be  seen  from  the  above  equation,  the  dependence  on  i  disappears  on  the 
right  hand  side  of  the  equation.  That  is,  each  state  in  class  Ek  has  the  same 
exponential  holding  time  density  function  for  transitions  to  class  Ef  for  all  r.  So  all 
the  states  in  class  Ek  can  be  merged  together  and  the  aggregated  model  has  the 
characteristic  of  a  Markov  process. 

The  second  part  of  condition  1,  defined  by  Eq.  (2.2),  is  often  satisfied  by  a 
fault-tolerant  system  model.  If  the  system  components  all  have  small  constant 
failure  rates  proportional  to  e,  then  each  class  of  states  for  the  enlarged  process  can 
be  formed  by  grouping  together  all  the  states  that  have  the  same  groups  of  working 
and  failed  components  but  with  different  statuses  of  the  RM  logic.  The  class-to- 
class  transitions  are  then  possible  only  through  the  small  possibility  of  failure  of  a 
component.  When  c=0,  i.e.  when  no  failures  can  take  place,  the  only  transitions 
that  are  possible  are  those  within  each  class  due  to  the  outcomes  of  the  RM 
decisions.  If  there  is  Built-In  Test  Equipment  (BITE)  included  in  the  RM  system,  a 
component  that  was  previously  isolated  as  failed  by  the  RM  can  be  brought  back 
on  line.  For  this  kind  of  system,  the  imbedded  Markov  chain  for  each  class  is 
generally  ergodic.  Then  the  second  part  of  condition  1  is  satisfied.  The  remaining 
condition  that  has  to  be  satisfied  is  defined  by  Eq.  (2.1)  or  the  first  part  of 
condition  1.  Usually,  this  condition  is  not  satisfied  by  a  fault-tolerant  system 
model.  The  reason  is  as  follows:  if  e  is  small,  i.e.  the  Mean  Time  To  Failure 


(MTTF)  of  the  components  is  large,  say  hundreds  of  hours,  then  the  holding  time 
of  the  transition,  particularly  those  within  a  class,  is  determined  only  by  the  noise 
in  the  signals  and  the  threshold  set  by  the  FDI  test  designer.  So,  as  the  failure  rate 
tends  to  zero,  the  RM  decision  delay  will  not  be  affected  by  the  failure  rate.  So, 
the  transition  kernel  of  a  fault-tolerant  system  semi-Markov  process  model  will  not 
take  on  the  form  implied  by  Eq.  (2.1).  Because  Eq.  (2.1)  is  not  satisfied,  the 
enlarged  process,  if  it  can  even  be  formulated,  may  be  an  invalid  approximation  to 
the  aggregated  semi-Markov  process  model. 

2.2  Extension  of  Korolynk’s  Work 

As  described  in  Section  2.1,  the  only  condition  in  Korolyuk’s  theorem  that  is 
not  satisfied  by  fault-tolerant  system  is  that  the  FDI  decision  delay,  and  therefore 
the  holding  time  probability  density  functions,  does  not  depend  on  the  small 
parameter  e.  However,  the  state  transition  delay  of  a  semi-Markov  process  would 
be  dependent  on  a  small  parameter  mathematically  if  the  temporal  line  on  which 
the  delay  was  originally  measured  is  scaled,  say  by  a  time  scaling  factor  6.  In  this 
way,  fault-tolerant  system  models  can  be  modified  to  satisfy  all  the  conditions 
required  for  the  enlarged  process  results  to  be  applied.  Section  2.2.1  shows  how  the 
transition  kernels  of  a  semi-Markov  process  depend  on  the  small  parameter  6  when 
the  process  is  characterized  on  a  new  temporal  line.  Section  2.2.2  will  derive  the 
parameters  of  the  Markov  process  that  approximates  the  behavior  of  the 
aggregated,  time  scaled  semi-Markov  model. 


2.2.1  Changing  the  Time  scale  of  a  Perturbed  Semi-Markov  Process 


A  fault-tolerant  system  model  with  a  finite  number  of  states  evolving  in 
continuous  time  is  a  semi-Markov  model  which  is  completely  characterized  by  its 
transition  kernel  matrix.  The  standard  form  of  the  (i,j)  element  of  the  matrix  is  as 
follows: 

*»(»>- *#**<*>  (2  4) 

where  pjj  is  the  eventual  transition  probability  and  hj •( t)  is  the  conditional 
transition  time  probability  density  function  for  transitions  from  state  i  to  state  j. 
The  eventual  transition  probability  is  the  probability  that  the  process  that  entered 
state  i  on  its  last  transition  will  enter  state  j  on  its  next  transition.  Before  making 
this  transition,  the  process  "holds”  for  a  random  time  in  state  i,  where  the  time  is 
governed  by  the  conditional  transition  time  probability  density  function.  In  fault- 
tolerant  systems  with  small  component  failure  rates  of  order  e,  h^t)  is  related  to 
the  PDFs  of  the  time  delay  of  the  FDI  tests.  Obviously,  h^t)  does  not  in  general 
depend  on  e.  However  it  will  depend  on  another  small  parameter  6,  the  time 
scaling  factor,  if  the  original  temporal  line  on  which  the  FDI  decision  delay  was 
measured  is  scaled.  If  a  stochastic  process  is  observed  in  another  time  scale  that  is 
1/6  times  that  of  the  original,  then  the  holding  time  PDF  hjj(t)  certainly  will  be 
affected  but  the  p^  will  be  the  same  because  the  eventual  transition  probability  p^ 
only  characterizes  the  transition  probability  from  state  i  to  state  j  for  the  next 
transition  whenever  it  occurs.  Therefore,  it  is  not  related  to  the  time  scale  in 
which  the  process  is  observed.  However,  the  "new”  holding  time  PDF  is  not 
obtained  by  just  replacing  the  argument  of  the  original  PDF  by  t/6  because  if  the 
original  h^t)  is  replaced  by  hjj(t/5)  for  the  change  of  time  scale,  then  integration  of 
hjj(t/6)  from  time  t=0  to  time  t=oo  does  not  produce  1.  This  means  that  h^t/iS) 
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is  not  a  proper  holding  time  PDF.  For  a  change  of  time  scale  of  a  stochastic 
process,  the  CDF  Fjj(t)  of  the  corresponding  PDF  h^t)  must  be  found,  and  the 
argument  of  the  CDF,  t,  must  be  replaced  by  t /6.  Then  the  holding  time  PDF  of 
the  process  observed  in  the  new  time  scale  is  h^f)  =  ^Fj/if/S).  So,  the  statistics 
of  the  process  in  the  new  time  scale  depend  on  the  small  parameter  £,  the  time 
scaling  factor.  If  6  equals  e,  i.e.  if  the  time  scaling  factor  is  equal  to  the  failure  rate 
of  the  components,  then  the  condition  is  satisfied.  But  6  is  not  necessarily  equal  to 
€  for  the  derivation  of  the  enlarged  process  and  the  enlarged  process  will  be  derived 
in  the  next  section. 

2.2.2  Derivation  of  ^rk(s)  of  a  Time-Scaled  Perturbed  Process 

As  pointed  out  in  the  last  section,  a  time-scaled  version  of  the  original  process 
is  not  required  in  the  evaluation  of  the  approximate  solution.  So  what  follows  is 
the  proof  that  the  aggregated  semi-Markov  process  in  scaled  time  evolves  as  a 
Markov  process  and  the  derivation  of  the  parameters  of  the  Markov  process.  A 
similar  approach  to  that  of  reference  [4j  will  be  used  in  this  section  for  the  proof 
and  the  derivation  of  the  parameters. 

It  is  assumed  that  the  system  semi-Markov  model  depends  on  the  small 
failure  rate  parameter  e  in  such  a  way  that  the  entire  space  of  states  of  the  model 
E  can  be  split  into  disjoint  classes  of  states  E={Er...Ek}  such  that  the 
probabilities  of  departure  from  each  class  tend  to  zero  as  e  tends  to  zero.  In 
addition,  if  the  process  is  observed  on  a  temporal  line  1/6  times  that  of  the  original 
then  the  sojourn  in  a  given  state  tends  to  zero  as  6  tends  to  zero.  To  illustrate  this 
point,  consider  a  process  that  is  observed  in  terms  of  hours  while  it  originally  was 
described  in  terms  of  seconds.  Then  the  PDF  describing  the  delay  for  transitions 
from  state  i  to  state  j  will  be  "crushed”  near  the  origin,  so  the  sojourn  in  state  i 
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will  be  small  in  units  of  hours.  Because  the  whole  process  is  observed  in  a  new  time 
scale,  all  of  the  )  will  depend  on  the  small  parameter  6. 

A  time-scaled  semi-Markov  process  with  the  above  characteristics  can  be 
characterized  by  the  following  equations, 


,  r  pr.'  -  €?:> 

p\.  =  { 

Vji  1  (Jfc) 

1  * 


(k)  -  “>{k)  ij€Ek, 

e  Ek,  j  £  Ek, 


(2.5) 


where  pf.  is  the  eventual  transition  probabilities  of  the  imbedded  Markov  chain, 
and  the  non-perturbeded  eventual  transition  probabilites  pjj^  satisfy  the  following 
equation 


£  P^=l  *  €  Ek,  1  <k<m  (2.6) 

jeEk 

and  the  element  of  the  transition  probability  matrix  can  be  expressed  as, 


p‘/<) = p‘,  Ffi  m  >,,<=e 


(2.7) 


where  F^(  )  is  the  CDF  of  the  transition  delay  for  the  process  in  the  original  time 
scale.  Eq.(2.7)  is  a  generalization  of  the  form  of  the  transition  probability  matrix 
elements  that  define  the  semi-Markov  process. 

If  denotes  the  sojourn  of  the  semi-Markov  process  in  class  Ek  when  it 
starts  from  state  i  and  moves  to  Ef  and  denotes  the  sojourn  of  the  semi-Markov 
process  in  state  i,  with  the  CDF  Fjj(t),  while  <5*.  are  the  indicators  of  transition  from 


obtained  by  using  the  expression  for  the  total  probability  : 


iiwaB»GMaMBoiHiaai><aartMH»aafloaaai>^a»5oas»3aaa3eawa8a»«6cfr»a«a6CBa<fr>5^>»^?>>>>^>>>3^»x . 


-27- 


itEk 

+  E  P<^=1’  **£*>  (2-8) 

j€Ef 

Hence 

^{^<0=  E  fnftz  •—)*%(•)+  E  •')  <2-9) 


j‘€Eifc  •'e£r 

Using  the  Laplace  transform, 

$(*)  =  E  {  e-’i't  } 

(2.10) 

C‘"‘KW 

(2.11) 

then  eq.(2.9)  becomes, 

^w=  E  w  pji w +  E  Pji <s) 

J6E*  ye£r 

Combining  the  Laplace  transform  of  Eq.  (2.7)  and  Eq.  (2.5)  : 

(2.12) 

P)i  («)  =  (  Pyf  -  ^  )(  1  -  +  0(f) )»  j  6  £* 

(2.13) 

p'.  (a)  ==  +  0(c),  j  £  Ek 

(2.14) 

substituting  these  expressions  in  eq.(2.12),  it  becomes, 
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(«)  “  Y1  4*Mpiv,=s”  E  (  Ssa-  pW  +  eqW  )  0^(3) 


*€  E_ 


+  «E  «5?+°m 

jeEr 


(2.15) 


Passing  to  the  limit  as  £  and  6  — *•  0  ,  the  functions  ^'?(s)  are  found  to  satisfy  the 


system  of  equations, 


*12  m-  E 


(2.16) 


It  follows  from  this  and  the  assumption  that  the  imbedded  Markov  chain  defined 
by  the  transition  probabilities  pjj^  ( i,  j  6  Ek  )  is  ergodic,  that  (  see  (1)  )  the  solution 
of  system  Eq.  (2.18)  is  independent  of  the  superscript,  i.e.  for  all  i  £  Ek, 
^|l|j(s)=^r|t(s)-  Multiplying  Eq.  (2.15)  by  the  stationary  probabilities  and 
summing  over  all  i  €  Ek,  then  cancelling  e,  the  following  is  obtained, 

E  ^ E  E  ^ E  #  (2.17) 

•'€£*  ;€£*  jeEr 


4>rk(*)  = 


E  »!‘>  E  $ 

E(k)  V'  I  S  (*)  .  (k)  1 

’)  E  (2»%py+.,y) 


(2.18) 


;€E. 


’  Ak/a  +  a 


(2-19) 


where 
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Or 


_  5 


e 


(2.20) 


(2.21) 


(2.22) 

(2.23) 

(2.24) 

(2.25) 


t  dFj%{t)  (2.26) 

This  completes  the  proof  that  any  semi-Markov  model  with  the  properties 
stated  above  can  be  approximated  by  a  Markov  process  whose  parameters  were 
also  derived.  The  Markov  process  evolves  in  a  longer  time  scale,  i.e.  1/6  times  that 
of  the  original  process.  For  instance,  if  <5=1/3600  and  if  the  original  semi-Markov 
model  evolves  in  seconds  then  the  approximate  enlarged  Markov  process  will  evolve 
in  hours. 


One  of  the  sufficient  conditions  in  the  derivation  in  this  Chapter  for  the 
enlarged  process  is  that  all  the  classes  must  be  ergodic.  This  condition  is  not 


J 


"i 

5 

\ 

* 
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generally  satisfied  by  all  fault-tolerant  system  models.  One  non-ergodic  model  will 
be  examined  in  Chapter  5  and  this  issue  will  also  be  discussed  in  Chapter  6. 

There  are  two  parameters  involved  in  the  derivation,  namely  €  and  8,  but  the 

parameter  that  actually  affects  the  behavior  of  the  original  semi-Markov  process  is 

e  while  6  is  just  a  time  scaling  factor  that  relates  the  time  scale  of  the  approximate 

Markov  process  and  that  of  the  original  semi-Markov  process.  However,  there  is  no 

known  way  to  show  how  small  £  must  be  for  the  Markov  process  to  be  a  good 

description  of  the  behavior  of  the  aggregated  semi-Markov  model.  So,  assessment 

l 

of  the  effect  of  the  small  parameters  will  have  to  rely  on  empirical  results.  For  this 
purpose,  a  fault-tolerant  system  semi-Markov  model  will  be  constructed  in  the  next 
chapter. 


Chapter  3 

Construction  of  Fault- Tolerant 
System  Model 


In  the  preceding  chapter  it  was  proved  that  under  certain  conditions  such  as 
vanishingly  small  €  and  ergodic  classes,  an  aggregated,  perturbed,  time-scaled  semi- 
Markov  process  evolves  asymptotically  as  a  Markov  process  and  the  parameters  of 
the  approximate  Markov  process  were  also  derived.  However,  bounds  on  the  size  of 
e  are  not  known  for  the  Markov  process  to  be  a  good  approximation  of  the  original 
semi-Markov  process.  As  mentioned  before,  e  is  usually  the  system  component 
failure  rate.  Then  the  question  arises:  For  the  approximation  to  be  reasonably 
good,  would  e  have  to  be  extremely  small  ?  In  another  words,  do  the  MTTFs  of  the 
flight  control  system  components  of  subsystems  have  to  be  unrealistically  big,  say  5 
years,  which  is  equivalent  to  6=4.47x10*®,  for  the  aggregated  system  model  to 
behave  approximately  as  a  Markov  process  ?  This  provides  the  motivation  for  the 
construction  of  a  generalized  Markovian  fault-tolerant  system  model  in  this  chapter 
for  such  investigation  and  for  the  demonstration  of  the  approximation  technique. 
Since  the  base-line  numerical  results  of  the  model  will  be  calculated  from  semi- 
Markov  theory  the  system  model  will  have  to  be  small  enough  to  avoid  excessive 
memory  storage  and  computational  burden,  but  it  will  be  rich  enough  to  include 
sequential  FDI  tests  and  self-tests  that  are  found  in  many  fault-tolerant  systems. 
Since  the  theory  developed  in  Chapter  2  is  in  the  continuous  time  domain,  the 
model  will  also  be  formulated  in  continuous  time.  Any  conclusions  obtained  in 
continuous  time  theory  should  also  be  valid  in  the  discrete  time  case. 


This  chapter  begins  with  a  section  that  describes  the  architecture  and  FDI 
structure  of  an  example  fault-tolerant  system.  The  next  section  states  the 
assumptions  that  are  made  in  the  model  constructuon.  The  state  definitions  will  be 
presented  in  Section  3.3.  The  formation  of  the  transition  kernel  of  the  semi- 
Markov  process  is  illustrated  in  the  next  section.  Decomposition  of  the  transition 
kernel  into  the  required  form  is  included  in  the  following  section. 

3.1  Structure  of  Three-Component  Fault- Tolerant  System 

Suppose  that  the  fault-tolerant  system,  or  subsystem,  comprises  three 
independent  instruments  which  are  measuring  (or  actuating  or  otherwise  operating 
on)  a  single  scalar  quantity.  Such  situations  arise  in  such  applications  as  flight 
control,  (  e.g.  body  rate  sensors  along  a  given  axis  and  actuators  for  segmented 
control  surfaces  ),  highly  reliable  data  processors,  (  e.g.  redundant  synchronizing 
clocks  ).  In  the  measuring  instruments  case,  three  independent  observations  of  a 
scalar  quantity  are  available.  With  a  set  of  two  linearly  independent  parity 
equations ,  those  three  independent  observations  are  used  to  generate  a  vector  parity 
residual  sequence.  The  RM  in  the  system  relies  on  the  Vector  Shiryayev  Sequential 
Test  (VSST)  which  makes  use  of  the  vector  parity  residual  sequence  to  detect  and 
identify  the  failure  mode  (see  Section  3.1.3  of  [10]).  In  contrast  to  other  sequential 
tests  (e.g.  the  Sequential  Probability  Ratio  Test  or  SPRT),  there  is  no  need  with 
the  VSST  for  a  separate  isolation  stage  once  a  failure  mode  is  detected.  When  an 
instrument  is  identified  as  failed,  it  is  removed  from  the  system  by  the 
reconfiguration  scheme. 

Once  an  instrument  is  removed  from  the  system,  a  SPRT  self-monitoring  test 
is  initiated  on  the  isolated  instrument.  The  intent  here  is  to  model  the 
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implementation  of  BITE  monitoring  that  is  often  included  in  real  systems.  The 
self-test  produces  either  a  failed  or  an  unfailed  indication  on  an  isolated  instrument 
and  when  there  are  two  consecutive  indications  that  the  instrument  is  unfailed, 
then  the  instrument  is  brought  back  on  line  and  the  VSST  FDI  test  is  reinitiated.  It 
is  assumed  that  no  effort  is  made  to  detect  further  failures  when  two  unflagged 
instruments  remain  available. 

3.2  Assumptions  in  Model  Construction 

The  complete  structure  of  the  fault-tolerant  system  was  described  in  the  last 
section.  Before  we  proceed  to  construct  the  associated  generalized  Markovian 
model  several  assumptions  will  be  made.  Some  of  these  assumptions  make  this 
example,  and  most  other  fault-tolerant  systems  easier  to  analyze  by  semi-Markov 
technique.  These  assumptions  are  as  follows: 

(a)  The  time  to  failure  in  any  particular  instrument  is  exponentially 
distributed  and  independent  of  the  status  of  other  instruments  and  the  RM 
decisions. 

(b)  The  probability  of  more  than  one  event  occurring  during  any  dt  is 
negligible.  These  events  include  failures  of  components  and  decisions  by 
the  RM  system  or  by  the  self-tests. 

Assumptions  (a)  and  (b)  are  widely  used  in  the  analysis  of  fault-tolerant 
system  performance,  so  no  further  justification  for  them  will  be  given  here. 

Following  [lOj  ,  consider  the  situation  where  a  failure  occurs  at  some  time 
other  than  a  state  transition  time,  that  is  it  does  not  occur  at  a  renewal  time 
(where  the  VSST  is  reinitialized).  In  this  case,  VSST  will  have  established  values  of 
the  test  statistics  which  are  distributed  according  to  distributions  conditioned  on 


the  hypothesis  that  no  failure  is  present.  Since  the  VSST  is  initialized  with  zero 
test  statistics,  this  implies  that  the  test  has  a  "head  start"  towards  detection  at  the 
time  of  a  failure  which  is  likely  to  yield  a  smaller  delay  to  detection  relative  to  the 
delay  associated  with  newly  initialized  test  statistics.  However,  if  the  test  has  been 
designed  to  achieve  a  low  false  alarm  probability,  the  effect  of  assuming  the 
unaffected  test  statistics’  is  at  its  initial  condition  should  be  minor  relative  to  the 
effect  of  making  the  same  assumption  for  the  test  that  is  affected  by  the  failure. 
Thus: 

(c)  For  VSST  and  SPRT,  the  occurrence  of  a  failure  is  assumed  to  coincide 
with  a  renewal  time  for  the  test. 

The  last  assumption  below  is  an  unrealistic  one.  However,  it  can  still  capture 
the  non-memoryless  nature  of  the  self-test: 

(d)  Failed  and  unfailed  indications  by  the  self-test  are  independent. 

Although  this  assumption  is  not  the  case  for  a  SPRT,  under  this  assumption 

and  assumption  (b),  the  time  to  failed  and  unfailed  decisions  will  have  the  same 
density  function  but  with  different  eventual  transition  probabilities.  If  the  failed 
indication  rate  is  higher  than  the  unfailed  indication  rate  given  the  isolated 
component  is  failed,  then  there  will  be  a  higher  eventual  probability  for  failed 
indications  that  will  appear  in  the  transition  kernel.  These  assumptions  will  be 
used  in  the  transition  kernel  construction  in  Section  3.4. 

3.3  State  Definitions 


We  are  now  in  a  position  to  define  the  states  for  the  semi-Markov  model  of 
the  example  fault-tolerant  system  described  in  Section  3.1.  The  state 
characterizations  must  include  all  the  information  necessary  to  formulate  the 
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transition  kernels  for  the  exit  transitions  out  of  each  state.  In  this  system,  it  is 
necessary  to  know  the  following  in  order  to  characterize  each  state: 

1.  The  number  of  instruments  that  are  available  for  use. 

2.  Of  these,  how  many  of  them  have  failed. 

3.  If  an  instrument  has  been  isolated  by  FDI  as  failed,  the  status  of  the 
isolated  component  and  the  number  of  unfailed  indications  by  the  self-test 
for  this  instrument. 

Consider  what  happens  if  all  of  the  possible  system  configurations  are 
enumerated  as  the  system  states.  For  example,  suppose  that  the  condition  where 
the  first  instrument  is  failed  and  the  other  two  are  working,  in  the  case  of  3 
available  instruments,  is  enumerated  as  state  1,  the  second  instrument  failed  and 
the  other  two  working,  in  the  case  of  3  available  instruments,  is  enumerated  state 
2,  etc.  Then  the  resulting  model  will  have  twenty-six  states.  However,  since  all  the 
instruments  for  this  example  are  the  same,  there  will  be  no  difference  between 
states  I  and  2  in  terms  of  the  number  of  failed  and  working  instruments  or  in  terms 
of  how  many  failed  instruments  are  still  in  use.  Only  the  number  of  failed  and 
working  instruments  and  the  number  of  unfailed  indications  from  the  self-test  are 
necessary  in  the  state  definitions.  So,  by  merging  the  states,  the  dimension  of  the 
model  can  be  greatly  reduced,  in  this  case  to  just  nine. 

The  unacceptably  degraded  condition,  which  is  a  trapping  state,  is  denoted  by 
SL  (  system  loss  )  and  is  assumed  to  comprise  all  system  configurations  that  involve 
two  or  more  failed  components. 

Let  the  state  characterizations  be  denoted  by  the  following  notation  where 
brackets  indicate  sets  of  possibilities  from  which  one  and  only  one  element  will 
appear  in  each  state  characterization: 


MlUUV rfWWJT* JfV mm klaiim --- 
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3  instruments  available  for  use 


2  instruments  available  for  use 


In  the  case  where  3  instruments  are  available  for  use,  the  leading  3  represents  the 
three  available  instruments.  In  the  second  entry,  0  represents  no  failure  is  present, 
F  indicates  there  is  1  failed  component.  The  case  of  2  failures  is  not  included 
because  it  represents  a  system  loss.  When  two  instruments  are  available  for  use, 
the  notation  with  the  leading  2  follows  the  same  convention  as  before  and 
represents  the  two  available  instruments.  The  0  or  F  in  the  second  entry  indicates 
the  presence  of  no  failure  or  1  failure  among  the  three  components,  respectively.  C 
or  W  in  the  third  entry  indicates  whether  the  isolated  component  has  been 
correctly  or  wrongly  isolated,  respectively.  The  last  entry  represents  the  number  of 
consecutive  unfailed  indications  from  the  self-test  for  this  instrument.  As  an 
example,  consider  the  state  denoted  by  2/F/W/l.  This  means  that  two 
instruments  are  available  for  use  and  one  of  the  three  is  failed.  Furthermore,  the 
isolated  component  has  been  wrongly  isolated  (i.e.  it  is  not  the  failed  one).  Finally, 
there  has  been  one  unfailed  indication  from  the  self-test  for  the  isolated  component. 
An  exhaustive  list  of  all  the  states  and  the  state  numbering  scheme  follows: 
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state 

s.c.n.1 

state  description 

1 

3/0 

3  available,  none  failed,  VSST  in  operation. 

2 

2/O/W/O 

2  available,  none  of  the  three  failed,  no  unfailed  indication 
from  self- test. 

3 

2/0/W/l 

2  available,  none  of  the  three  failed,  1  unfailed  indication 
from  self-test. 

4 

3/F 

3  available,  1  failed,  VSST  operation  (i.e.  detection  delayed. 

5 

2/F/C/0 

2  available,  1  of  the  three  failed  and  correctly  isolated,  no 
unfailed  indication  from  self-test. 

6 

2/F/C/l 

2  available,  1  of  the  three  failed  and  correctly  isolated,  1 
unfailed  indication  from  self-test. 

7 

2/F/W/0 

2  available,  1  oMhe  three  failed  but  incorrectly  isolated,  no 
unfailed  indication  from  self-test. 

8 

2/F/W/l 

2  available,  1  of  the  three  failed  but  incorrectly  isolated,  1 
unfailed  indication  from  self-test. 

9 

SL 

system  loss. 

As  can  be  seen  above,  it  requires  9  states  to  describe  the  operational  states  of  this 
fault-tolerant  system.  From  now  on,  this  system  model  will  be  referred  to  as  the 
9-state  model. 

The  transitions  out  of  each  of  the  nine  states  correspond  to  the  occurrence  of 
one  of  the  random  events  such  as  component  failures  and  RM  decisions.  The  state 
transition  event  trees  for  all  9  states  are  given  in  Figure  3-1.  For  clarity  of  how  the 
system  can  transit  from  one  state  to  another  state,  a  state  transition  diagram  is 


l 


state  characteristic  notation 


-V jCw-V-V  V.  .- 


.  v.VJVJ 
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©  3/0 


Transition  due  to  occurrence  of 


©  2/0/W/0  false  alarm  by  VSST. 

(5)  3/F  failure  of  one  of  the  3  instruments. 


©  2/0/W/0 


©  2/0/W/0 
©  2/0/W/l 
©  2/F/C/0 
©  2/F/W/O 


failed  indication  from  self-test. 

unfailed  indication. 

failure  of  isolated  instrument. 

failure  of  one  of  the  two  available 
instruments. 


®  2/0/W/l 


©3/0 

©  2/O/W/O 
©  2/F/C/l 
©  2/F/W/l 


2nd.  consecutive  unfailed  indication  from 
self-test,  instrument  brought  back  on  line. 

failed  indication  from  self-test. 

failure  of  isolated  instrument. 

failure  of  one  of  the  two  available 
instruments. 


©3/F 


(5)  2/F/C/0  correct  isolation  by  VSST. 
(T)  2/F/W/O  wrong  isolation  by  VSST. 
©  SL  2nd.  failure  of  instrument. 


Figure  3-1:  State  transition  event  trees 


—  ©  2/F/C/O 

failed  indication  from  self-test. 

©  2/F/C/O  — 

—  ©  2/F/C/l 

unfailed  indication  after  previous 
indication  from  self-test. 

failed 

5? 

© 

J 

2nd.  failure  of  instrument. 

—  ©3/F 

2nd.  consecutive  unfailed  indication 
self-test,  instrument  brought  on  line. 

from 

©2/F/C/l  - 

—  ®  2/F/C/O 

failed  indication  from  self-test. 

—  ©  SL 

2nd.  failure  of  instrument. 

— ©  2/F/W/O 

failed  indication  from  self-test. 

©  2/F/W/O  — 

—  ©  2/F/W/l 

unfailed  indication  from  self-test. 

—  ©  SL 

2nd.  failure  of  instrument. 

—  ©  3/F 

2nd.  consecutive  unfailed  indication 
self-test,  instrument  brought  on  line. 

from 

®  2/F/W/l  — 

—  ©  2/F/W/O 

failed  indication  from  self-testi 

L@  SL 

2nd.  failure  of  instrument. 

®  SL 

©  SL 

trapping  state. 

Figure  3-1,  continued 


shown  in  Figure  3-2. 


3.4  Transition  Kernel  Matrix  for  the  9-State  Model 

« 

In  order  to  formulate  the  transition  kernels  for  the  9-state  model  in  closed 
form,  the  following  conditional  decision  time  density  functions  associated  with  the 
two  sequential  tests  employed  by  the  system  and  the  time  to  failure  density 
function  of  each  instrument  are  assumed  to  be  known: 


density  function  of  time  to  isolation  by  VSST  under  condition  that  no 
failure  is  present,  with  parameter  XQ. 

density  function  of  time  to  isolation  by  VSST  under  condition  that  one 
failure  is  present,  with  parameter  Xr 

density  function  of  time  to  failed  indication  by  self-test  SPRT  under 
condition  that  no  failure  is  present  in  the  isolated  instrument,  with 
parameter  XWQ. 


density  function  of  time  to  unfailed  indication  by  self-test  SPRT  under 
condition  that  no  failure  is  present  in  the  isolated  instrument,  with 
parameter  XWl< 


density  function  of  time  to  failed  indication  by  self-test  SPRT  under 
condition  that  a  failure  is  present  in  the  isolated  instrument,  with 
parameter  XpQ. 


density  function  of  time  to  unfailed  indication  by  self-test  SPRT  under 
condition  that  a  failure  is  present  in  the  isolated  instrument,  with 
parameter  XFl. 


density  function  of  time  to  failure  of  each  instrument,  with  parameter 
€. 


These  decision  time  density  functions  for  the  tests  will  be  assumed  to  be  2nd  order 
Erlang  functions  (see  Appendix  B  for  the  properties  of  the  density  function)  and 


they  are  relatively  realistic  because  the  sequential  tests  are  unlikely  to  reach  their 
decision  either  a  very  short  time  or  a  very  long  time  after  they  are  initiated. 
Rather,  they  are  more  likely  to  reach  a  decision  around  a  region  of  time  that  is 
some  distance  after  the  test  is  initiated.  To  illustrate  this  point,  a  Monte  Carlo 
simulation2  for  the  correct  decision  Probability  Mata  Function  (PMF)  of  a  VSST 
was  obtained  and  is  plotted  in  Figure  3-3.  It  shows  that  most  of  the  decisions  are 
reached  at  around  18  seconds  after  the  test  is  initiated. 

After  the  conditional  decision  time  density  functions  for  the  tests  are  known, 
the  transition  kernel  can  be  constructed  by  considering  what  the  kernel  elements 
actually  represent: 

Pji  (r)  dr  =  Pji  h~  (r)dr 

=  Pr  {  «  -*  j  in  [  r,  r  +  dr )  j  enter  i  at  0  } 

By  expanding  the  meaning  of  i  — ►  j  and  the  definition  of  conditional  probability, 
Pji(r)drcan  be  rewritten  in  two  different  forms  as  follows: 

Py  (r)  dr  =  Pr  {  i  — *  j  in  [  r,  r  +  dr  )  and  no  i  k  at  any  t  <  r 

for  k=  1,2, ,N  |  enter  i  at  0  }  (3.1a) 

=  Pr  {  »  — *  j  in  (  r,  r  +d  r )  \  no  i  — *  k  at  any  t  <  r 
for  k=s  1,2, . ,N  and  enter  i  at  0  } 

Pr  {  no  i  — *  k  at  any  t  <  r 
for  k=  1 . ,N  |  enter  i  at  0  }  (3.16) 

Following  (10),  the  form  in  Eq.  (3-la)  will  be  called  the  direct  form  because  it  is 
simply  a  restatement  of  the  definition  of  Pj-(t).  Eq.  (3- lb)  will  be  called  the 
conditional  decomposition  of  Pjj(t).  For  clarity,  Eq.  (3.1b)  can  be  modified  as, 

ey 

"Monte  Carlo  simulation  source  code  is  supplied  by  the  author  of  reference  [10] 
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P~  (r)  dr  =  Pr  {  i  -*  j  in  [  t,  t  +  dr)  and  no  i  -*  j  at  any  t  <  t  \ 
no  i  -*■  k  at  any  t  <  r 

for  k=l,2,...,j—l,j+l,...,N  and  enter  i  at  0  } 

Pr  {  no  i  — ►  k  at  any  t  <  r 

for  |  enter  i  at  0  }  (3.1c) 

The  conditional  decomposition  of  P-(t)  for  each  j  and  i  provides  a  complete 
definition  of  the  behavior  of  the  semi-Markov  process.  Construction  of  the 
transition  kernel  by  use  of  the  conditional  decomposition  is  particularly  useful  for 
fault-tolerant  system  models  because  the  eventual  transition  probability  for  each 
state  transition  is  generally  not  known. 

The  construction  of  two  representative  transition  kernel  elements  of  the  9- 
state  model  is  described  below.  First,  the  transition  kernel  element  P21(t)  for 
transition  from  state  1  to  state  2  is  derived.  State  1,  which  represents  all  the 
instruments  are  working,  with  state  characterization  notation  3/0,  can  only  transit 
to  state  2  with  state  characterization  notation  2/0/0  and  to  state  4  with  state 
characterization  notation  3/F.  Hence,  the  transition  from  1  to  2  represents  the 
occurrence  of  a  false  alarm  by  the  VSST  in  the  absence  of  a  failure  of  any  one  of 
the  instruments.  Using  the  definition  of  Eq.  (3-lb),  the  transition  kernel  element  is 
derived  as  follows: 

P,21  (r)  dr  —  Pr  {  1  — ►  2  in  (  r,  r  +  dr )  no  1  -♦  2  at  any  t  <  r  | 
no  1  — *■  4  at  any  t  <  r  and  enter  1  at  0  } 

Pr  {  no  1  — ►  4  at  any  t  <  r  J  enter  1  at  0  }  (3.2) 


In  terms  of  the  conditional  density  functions  of  the  test  decision  times  defined  at 
eq.  (3.2a): 


P2i(T)dr=f°v(T)dr[l-jy(n)dn 


(33) 
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fv=s\*re  xOr  (2nd  order  Erlang) 
f  =  ee~lt  (exponential) 


then 

I 

i 

P21(r)dr=xjr«-Vdr(e~et]3 


or 


/>21(0  =  ^tg-(X°  +  3«M  (3.4) 

Another  transition  kernel  element  to  be  considered  explicitly  here  is  the  one 
representing  transitions  from  state  2  to  state  5.  State  2  and  State  5  have  state 
characterization  notation  2/O/W/O  and  2/F/C/O,  respectively.  Other  states  that 
stfcte  2  can  transit  to  are  states  2,  3  and  7  corresponding  to  state  notation 
2/O/W/O,  2/0/W/l  and  2/F/W/O,  respectively.  Hence  the  transition  from  state  2 
to  state  5  represents  the  occurrence  of  a  failure  in  the  isolated  instrument  in  the 
absence  of  any  failures  among  the  two  available  instruments  and  of  any  decision 
reached  by  the  self-test.  Then  the  transition  kernel  element  can  be  derived  as 
follows: 

P52(r)  =  Pr{2  —  Sin  [r,  r  + dr)  and  no  2  — ►  5  at  any  t  <  r  | 

no  2  -*  2,3,7  at  any  t  <  r  and  enter  2  at  0  } 

Pr  {  no  2  — ►  2,3,7  at  any  t  <  r  |  enter  2  at  0  }  (3.5) 

with  assumptions  (a)  and  (d),  Eq.  (3-3a)  can  be  rewritten  in  terms  of  the  conditional 
density  functions  of  the  test  decision  times: 
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Pi2(T)dT=  f(T)dr{l  I  [  1  - />V)4<  ]  !  1  -f^W  I2 

(3.6) 

By  substituting  expressions  for  the  density  fucntions: 

P52(r)<iras  ce~tTdr(  1 )  «“xH0r(  Xmr  +  1 )  e“xHlr6e“2tr 

*  <  (  X^r  +  1 )  (  \mT  +  1  )e"(xWO  +  xWl  +  3<) r  dr 

or 

Pi2  (0  =  <(W+1)(>W+1)  e~{Xm  +  Xm  +  3,)  *  (3.7) 

Two  of  the  twenty-six  nonzero  transition  kernel  elements  were  derived  above. 
The  remaining  elements  are  included  in  Appendix  C.  The  fault-tolerant  system 
model  is  completely  characterized  by  this  transition  kernel  matrix,  and  state 
probability  histories  can  be  derived  from  it  by  using  Eq.  (1.2).  Any  aspect  of  the 
system  performance  statistics  can  be  derived  from  it.  Thf  complete  transition 
kernel  matrix  is  given  in  Eq.  (3.8). 

If  <  is  set  equal  to  zero  in  Eq.  (3.8)  ,  that  is,  if  no  failures  can  occur  among  the 
instruments,  then  the  transition  kernel  will  be  reduced  to  the  form  shown  in  Figure 
3-4.  This  matrix  can  be  partitioned  into  a  block  diagonal  matrix  consisting  of  3 
blocks.  This  implies  that  no  transitions  occur  between  the  states  associated  with 
different  blocks.  Then  the  states  within  each  of  these  three  blocks  form  a  closed 
class.  Therefore,  when  the  original  process  is  reduced  to  a  non-perturbed  semi- 
Markov  process,  the  resulting  process  consists  of  3  classes.  The  first  class 
comprises  states  1,  2  and  3,  each  of  which  has  all  three  instruments  working  but 
with  different  RM  levels.  Class  two  contains  all  the  system  states  with  exactly  one 
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X:  non-zero  transition  kernel  elements 


Figure  3-4:  Structure  of  non-perturbed  O-state  model  transition  kernel  matrix 

failed  instrument,  i.e.  states  4,  5,  6,  7  and  8.  State  9,  the  system  loss  state,  is  the 
sole  element  of  the  third  class. 


3.5  Decomposition  of  Transition  Kernels  into  the  Standard  Form 


At  this  point,  it  is  useful  the  express  the  transition  kernel  in  the  form  which 
comprises  an  eventual  transition  probability  and  a  bolding  time  density  function  as 
in  Eq.  (2.1)  The  parameters  of  the  transition  kernel  elements  in  such  form  will  be 
used  for  calculating  the  parameters  of  the  approximate  Markov  process  governing 
the  class-to-class  transition.  The  two  transition  kernels  in  Eq.  (3.4)  and  (3.7)  will 
be  decomposed  into  the  required  form. 

Eq.  (3.4)  can  be  rewritten  as  : 


v /  v.v.  av.  a  r.  r.  «■„  <■«  «•.  <\  ■  v  « 
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\  2 

/>(/)- - 2 - (XQ  +  3<)2i«-<x0  +  3<)t  (3.9) 

(X0  +  3<)2 

Obviously,  the  second  term  of  the  RHS  of  the  above  equation  is  the  conditional 
holding  time  PDF  of  transitions  from  states  1  to  2.  The  first  term  will  be  expanded 
in  a  power  series  in  €  and  high  order  terms  of  e  will  be  neglected,  that  is  : 

x  2 

_ 2 - =  1  -  »!  +  o(c) 

( *0  +  3<  )2  x0 

(3.10) 

x0 

Substituting  in  eq.(3-5),  it  becomes 


P21(t)  =  {  1  -  Si  }  (  XQ  +  3<  )2  t  e“<  x0  +  3< )  * 

* 


(3.11) 


The  transition  kernel  element  for  transitions  from  state  2  io  state  5  in 
Eq.(3.7)  can  be  rewritten  as: 


/m( 0  =  €  (  xwt)x  w\t2  +  (  XVW)  +  xvn  )  *  +  1 1  e"(Xv*°  +  xwi  +  3< ) £ 

+  XW1 


wr  wr  ^ '  wo  T 
=  € _ iHwVi _ i  (  x  w  +  so3 12  e-(  xuo  +  Vi +  3<  >  ‘ 

(  XW0  +  XW1  ^  3<  )3  2 

+  <  — *..Xw)  1  X-^  * —  (  Xj^j  +  Xm  +  3<  )3  t  XH0  +  XW1  +  3< ) f 
<  XW0  +  XW1  +  3t  ^ 

+  e - i - (  Xu^  +  X^  +  3e  )3  e~{  xWO  +  xm  +  3<  > 1 

<xwo  +  xwi  +  3<>  m 


(3.12) 


It  can  be  seen  that  P5„(t)  comprises  3  terms,  each  of  which  is  an  "eventual 
transition  probability”  times  a  "holding  time  density."  Thus,  the  form  comprises 
more  than  one  term,  but  it  will  be  demonstrated  in  Section  4.3  that  those  terms 
can  be  combined  together  to  yield  the  standard  form  for  the  evaluation  of  the 
parameters  of  the  approximate  Markov  process.  A  complete  list  of  all  the 
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transition  kernel  elements  in  the  standard  form  is  included  in  Appendix  C. 

3.0  Closure 

In  this  chapter,  the  structure  of  an  example  fault-tolerant  system  has  been 
described.  After  stating  the  assumptions  and  defining  all  the  estates,  a  generalized 
Markovian  transition  kernel  matrix  was  constructed  and  it  completely  characterizes 
the  state  probability  evolution.  It  was  shown  that  the  non-perturbed  system  model 
can  be  decomposed  into  three  closed  classes.  Generally,  any  fault-tolerant  system 
model  can  be  decomposed  into  such  classes  if  each  class  contains  the  same  number 
of  working  instruments  and  failed  instruments. 
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Chapter  4 

Evaluation  and  Comparison  of  9-state 
Model  Exact  and  Approximate 
State  Probability  Histories 

Approximate  Markov  process  theory  was  developed  in  Chapter  2  and  a  fault- 
tolerant  system  model  was  constructed  in  Chapter  3.  The  9-state  model  exact  and 
approximate  solutions  will  be  evaluated  and  compared  in  this  chapter.  In  the  first 
section,  the  state  probability  histories  will  be  calculated  by  a  semi-Markov 
approach.  From  these  results,  the  normalized  state  probability  distribution  that 
exists  within  each  class  and  the  total  probabilities  for  each  of  the  three  classes  will 
be  evaluated.  In  the  next  section,  the  elements  of  the  approximate  technique  will 
be  deduced.  That  is,  the  stationary  probability  distributions  of  the  non-perturbed 
process  in  each  class  and  the  parameters  of  the  Markov  process  that  approximates 
the  behavior  between  the  classes  will  be  calculated.  The  ”state”  probability 
histories  of  the  approximate  aggregated  Markov  process  will  then  be  evaluated 
analytically.  Then,  the  approximate  state  probability  histories  will  be  constructed 
by  combining  these  results  with  the  stationary  probability  distributions  within  each 
class.  These  exact  and  approximate  results  will  be  compared  in  Section  4.3. 
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4.1  9-state  Model  Numerical  Results  from  Semi-Marlcov  Approach 

Because  it  is  relatively  easier  to  calculate  the  state  probability  histories 
numerically  up  to  a  certain  number  of  time  steps  than  analytically,  the  continuous 
time  system  representation  must  first  be  discretized  into  a  discrete  representation. 
The  interval  transition  probability  matrix  will  be  calculated  by  using  the  matrix 
convolution  sum  in  Eq.  (1.4)  and  then  the  state  probability  vector  at  each  time 
point  is  calculated  by  using  Eq.  (1.3).  The  initial  state  probability  vector  in  Eq. 
(1.3)  is  assumed  to  be, 

tt(0)  =  [1  000000000  ]r  (4.1) 

because  it  is  almost  always  the  case  that  at  the  start  of  a  mission  all  of  the 
instruments  are  working  and  all  of  the  tests  are  initialized. 

A  FORTRAN  source  program  was  written  to  calculate  these  quantities.  The 

f*  j 

failure  rate  €,  of  each  of  the  instruments  is  assumed  to  be  2.5x10  sec'1  which  is 
equivalent  to  a  MTTF  of  111.1  hours.  The  two  sequential  tests  employed  by  the 
system  are  assumed  to  have  the  decision  time  density  function  parameters  listed 
below: 


Xq  —  0.001  —  0.05  ^F0  — 

Xj  =  0.05  Xm  =  0.1  Xn=0.05 


\"t*>  that  the  smallest  of  these  values  (0.001)  corresponds  to  an  approximate  mean 
•mi**  b**tw*»en  events  of  0.278  hours,  which  is  3  orders  of  magnitude  shorter  than 
■  111  l  hours  MTTF  of  each  component. 

’  *■  ;  r  ^ram  used  double  precision  variables  exclusively,  and  was  run  on  a 
rrq  utpr  system  at  the  Massachusetts  Institute  of  Technology.  The 


•WWW 
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time  step  size  for  the  discretized  model  was  chosen  as  4  seconds  as  a  compromise 
between  the  desired  mission  length  and  the  accuracy  of  the  solution.  State 
probability  histories  up  to  800  time  steps  were  calculated.  This  is  equivalent  to  a 
mission  time  of  3200  seconds  or  just  under  one  hour.  The  state  probabilities  at 
various  time  points  between  160  seconds  and  3200  seconds  are  shown  in  Figure  4-1. 
The  evolution  of  the  probability  of  occupying  state  9,  which  is  the  system 
unreliability,  is  illustrated  in  Figure  4-2.  From  the  state  probability  histories,  the 
class  probability  histories  can  be  calculated  by  summing  the  state  probabilities  for 
the  states  within  each  class.  This  aggregated  probability  histories,  which  will  later 
be  compared  with  the  "state”  probability  histories  of  the  approximate  aggregated 
Markov  process  results,  is  shown  in  Figure  4-3  for  each  class.  The  evolution  of 
these  probabilities  for  the  1st  class  and  the  2nd  class  is  plotted  in  Figures  4-4  and 
4-5,  respectively. 

The  state  probability  distribution  of  the  original  process  will  be  approximated 
by  expanding  the  "state”  probability  distribution  of  the  approximate  Markov 
process  with  the  stationary  probability  distributions  of  the  non-perturbed  process 
within  each  class,  as  in  Eq.  (1.5).  Therefore,  one  way  to  measure  how  good  the 
approximation  is,  provided  the  approximate  Markov  process  gives  the  exact  class 
probability  distribution,  is  to  observe  how  quickly  and  how  accurately  the  exact 
normalized  probability  distributions3  in  class  1  and  2  approach  the  stationary 
probability  distributions  for  these  two  classes.  Therefore,  the  state  probability 
distributions  calculated  above  were  normalized  and  the  results  are  shown  in  Figure 
4-6. 


^Tbe  normalized  probability  distribution  in  a  class  is  calculated  by  dividing  the  probability 
distribution  elements  by  the  total  probability  of  occupying  that  class. 
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Figure  4-1;  Exact*  state  probability  histories  of  the  9-state  model. 
(*  to  within  numerical  round-off  error) 


smmmmm 


;ystem  loss  probability  history. 


*••  total  proto.  In  oach  clast  ••• 
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Figure  4-3:  Exact  class  probabilities  history  of  9-state  model. 
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igure  4-5:  Exact  class  2  probability  history. 
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Figure  4-6:  Exact  normalized  probability  distribution  histories  for 
classes  1  and  2  of  the  9-state  model. 
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4.2  Approximate  State  Probability  Histories  for  the  9-State  Model 
4.2.1  Imbedded  Markov  Chains 

It  was  shown  in  the  last  chapter  that,  when  6=0,  the  9-state  model 
decomposes  into  a  non-perturbed  model  consisting  of  3  closed  semi-Markov  chains. 
The  eventual  transition  probabilities  of  these  non-perturbed  semi-Markov  processes 
completely  define  the  imbedded  Markov  chains.  With  the  numerical  values  of  the 
parameters  of  the  model  listed  in  Section  4.1,  the  transition  probability  matrix  of 
the  imbedded  Markov  chain  is  found  and  shown  in  Eq.  (4.2). 

*0  0  0.7407  0  0  0  0  0  0 

1  0.2593  0.2593  0  0  0  0  0  0 

0  0.74074  0  0  0  0  0  0  0 

0  0  0  0  0  0.2593  0  0.7407  0 

P=  0  0  0  0.9  0.7407  0.7407  0  0  0  (4.2) 

0  0  0  0  0.2593  0  0  0  0 

0  0  0  0.1  0  0  0.2593  0.2593  0 

0  0  0  0  0  0  0.7407  0  0 

00  000  0  0  01 


The  transition  probability  matrix  is  raised  to  successively  higher  powers  to 
characterize  the  behavior  of  the  imbedded  process  after  many  transitions.  It  was 


found  that  when  the  power  exceeds  40,  a  stationary  interval  transition  probability 
matrix  establishes  itself  as  in  Eq.  (4.3). 

By  a  result  in  Markov  process  theory  [3],  it  can  be  concluded  that  the 


0.2307  0.2307  0.2307  0 


0.4388  0.4368  0.4368  0 


0.3235  0.3235  0.3235  0 


0  0.0550  0.0550  0.0550  0.0550  0.0550  0 


pm 


0  0.7366  0.7366  0.7366  0.7366  0.7366  0 


0  0.1910  0.1910  0.1910  0.1910  0.1910  0 


0  0.0100  0.0100  0.0100  0.0100  0.0100  0 


0  0.0074  0.0074  0.0074  0.0074  0.0074  0 


0  1 


decoupled  imbedded  Markov  chains  for  each  class  are  ergodic  with  the  stationary 
probability  vectors  in  each  class  being, 


[  0.2397  0.4386  0.3235  }T 

(0.0550  0.7366  0.1910  0.0100  0.0074  ]T 

[U 


As  a  result,  the  second  condition  stated  in  Chapter  2  is  satisfied  by  the  9-state 


model  and  the  approximate  Markov  process  will  be  valid. 


4.2.2  Stationary  Probability  Distribution  of  the  Non-Perturbed  Process 


The  stationary  probability  distribution  of  the  non-perturbed  process  is  needed 
to  expand  the  approximate  Markov  process  results  in  order  to  approximate  the 
state  probability  distribution  of  the  original  process.  By  semi-Markov  theory,  this 
stationary  probability  distribution  for  each  non-perturbed  class  is  given  by 
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*,= 


*A/.  Ti 


where  f  is  the  mean  waiting  time  of  the  process: 


f  =  E  *M.  f. 


and  where  jtm  is  the  stationary  probability  for  state  i  of  the  imbedded  Markov 
process  that  is  characterized  by  the  eventual  transition  probability  matrix  of  the 
semi-Markov  process,  fj  is  the  mean  holding  time  in  state  i  and  it  is  given  by, 


\ = E  *» 


where  pjj,  with  the  same  notation  before,  is  the  eventual  transition  probability  from 


state  i  to  state  j  and  fjj  is  the  mean  holding  time  for  transitions  from  state  i  to 


state  j  which  is  defined  by, 


r  00 

W 


(4.10) 


The  calculation  of  the  stationary  probability  distribution  of  the  non- 


perturbed  semi-Markov  chain  in  class  1  is  demonstrated  here  and  that  of  class  2  is 


included  in  Appendix  D. 


For  the  non-perturbed  process  in  class  1,  state  1  can  transit  only  to  state  2, 


state  2  only  to  2  and  3,  and  state  3  only  to  1  and  2.  The  mean  holding  time  for 


transitions  from  state  i  to  state  j  is  derived  as  follows: 


Since  p^^i t)  is  not  in  the  simplest  form,  f 22  will  be  derived  here.  From  the 


transition  kernel  matrix, 


—  ?22^22^  "*■  P22rJl22^ 


r  --  w  -  >  _  »  ,  c,  «r 
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where 

p  —  2XUt)2  xwi 

1  (xw) +  Xm^3 
x  2 

P,2  — — — - 
(xW>  +  xWl) 

*22,(0  =  |  (Xwi  +  V)3  '2 

*22,(0  =  (^KO  +  ^m)2  ' 

This  can  be  rewritten  in  standard  form  as  : 


(4.11) 


where 


P‘22  ~  ?22x  +  p222 

Note  that  any  kernel  element  given  by  a  sum  of  terms  can  be  treated  similarly.  So 
by  definition,  the  mean  holding  time  ?22  is  given  by  : 


22 


r  oo  Poo  Poo 

=  /  »{-=lA  (0  +  ^2A  (f))rf( 

Jo  ?22  1  ?22  2 


_  ^22x 


p22n 


?22  (XW0  +  XVH^  p22  (XWt)  +  Xm} 
By  a  similar  approach, 


(4.12) 


f  —  2 

•  Ol  -  — 


21 


f32  ~ 


A0 

__  p32j 


p32. 


p32  (XWb  +  XW1^  p32  (XMD  +  Xvn* 
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where 


p  —  2  xwi  xm> 

1  (xwo  +  xwi)3 

x  2 

n  -  XW1 

(xVM)  +  xvvl)“ 

f  13  =  f32 
f23  =  f22 


From  Eq.  (4.9)  with  the  numerical  values  of  the  parameters  and  statistics  of 
the  9-state  model  kernels  substituted,  the  mean  waiting  times  of  the  states  in  class 
1  are, 


f l  =  p01  f21  =  2000  seconds 

T2  =  p22  f22  +  p32  f32  =  16.296  seconds 

f3  =  Pi3  fi3  +  P23  f23  =  16  296  seconds 

With  given  in  Eq.  (4.4),  T  is  given  by  : 

T  =  ^  f-  =  491.790  seconds 


Then  the  stationary  probability  distribution  in  class  1  is, 


ri 


=  0.9748 


*Xf2  r2 


=  0.0145 


nM3  r3 


=  0.0107 


or, 


x*1)  =  [  0.9748  0.0145  0.0107  \T 
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(4.13) 


The  stationary  probability  distribution  in  class  2  is  evaluated  in  Appendix  D  and 
the  result  is  as  follows  : 

(0.1250  0.6820  0.1708  0.0093  0.0069  ]T  (4.14) 

Class  3  consists  of  only  one  state,  so  the  stationary  probability  distribution  is, 

,r(3)  =  (l)r  (4.15) 

4.2.3  Approximate  Markov  process 

The  Laplace  transforms  of  the  kernel  elements  of  the  approximate  Markov 
process  were  derived  in  Chapter  2  and  they  involve  the  time  scaling  factor  6.  But  6 
is  only  the  scaling  factor  relating  the  temporal  scales  of  the  two  processes.  It  can 
be  set  to  any  value  and  the  resulting  ^rk(s)  will  be  different  for  different  values  of 
6.  The  enlarged  process  hence  deduced  will  be  related  to  the  original  process  by  the 
time  scale  factor  6  set  in  the  derivation  of  <£rk(s). 

The  parameters  6 ,  €,  prk  and  A^.  in  Eq.  (2.19)  for  kernel  element  .  ^  - 
completely  define  the  enlarged  process  and  prk  and  .ik  can  be  derive.) 
parameters  of  the  original  semi-Markov  process,  as  shown  in  F  .  .*<> 

with  the  result  that  the  enlarged  process  approximates  ih*  re 
semi-Markov  process  in  a  new  time  scale  and  anv  r*-'!.?- 
enlarged  process  must  be  scaled  tc  the  original  *  itm-  -  *  • 
representation  of  the  original  process  is  n'  t  m—  u  • 
and  it  will  not  be  derived  here  What  Ml'  *- 
the  enlarged  process  of  the  9— tat*  m  ;• 

- -  -  - - 
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2.5x1  (T®  sec*1,  the  same  value  as  the  failure  rate  e  of  each  instrument. 

(2.10)  is  reduced  to, 

Then  Eq. 

*rk  W  ”  Prk  7~Zr 

(4.18) 

which  is  exactly  the  result  given  in  [4j. 

The  procedure  for  calculating  is  as  follows  (numerical  results 

are  quoted 

from  Appendix  C) : 

,<i,»  E  = 4V 

JeB1 

ss  8000 

(4.17) 

E 

jeEx 

=  13.333  +  35.555 
=  48.888 

(4.18) 

-W.  V  o(1)  =  o(l)  +  g(1) 

*3  Z-*  \j3  *13  *23 

|  =  35.555  +  13.333 

|  =  48.888 

(4.19) 

=  2000 

(4.20) 

aU)_  V  a  D(l)  +  a  D(l) 

fl2  aJlPj2  a22  ?22  +  a32  ^32 

J'€E j 

=  16.296 

1 

(4.21) 

E  wf 


*3  “J3  rj3 

i€«j 
=  16.206 
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“is  p[li  +  aa  Pa 


(4.22) 


Substituting  Eq.  (4.17)  to  (4.22)  into  Eq.  (2.22),  then 


8000*,,  +  48.888*  w  +  48.888*  u 

A  _  M\ _ ^2 _ _3_ 

1  2000*  +  16.206yO»)M  +  18.296*  M 

1  2  3 

=  3 


(4.23) 


The  procedure  for  obtaining  p„,  is  as  follows 


e  e 


Ml 


j€E2 
=  6000 


(4.24) 


e'= 


E  # 


i€E2 
=  48.888 


(4.25) 


,(21) 

*3 


JL,  qfi  ?63  ^  M3 
>€F0 

m 

=  48.888 


(4.26) 


Substituting  Eq.  (4.20)  to  (4.22)  and  Eq.  (4.24)  to  (4.26)  into  Eq.  (2.21),  then 


H 1  = 


6000* ^  +  48.888*^ +  48  888*  ^ 

2000*^  +  16.296*  M  +  16.206*  ^ 
12  3 


(4.27) 


p2l  equal  to  1  implies  that  p31  equals  0  due  to  the  fact  that  the  sum  of  the  eventual 
transition  probabilities  exiting  a  state  must  be  1. 
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The  procedure  for  obtaining  is  es  follows  : 

E 

i€£2 
—  80 


i€E2 
—  32.503 


i€E2 
—  32.503 


42'-  E  4?' 

j  €  E2 
—  32.503 


j€E2 
—  32.503 


E  w!? 

j€E2 

40 


«M  4*  +  a71  4«’ 


>€E2 

-  16.206 


(4.28) 


(4.20) 


(4.30) 


(4.31) 


(4.32) 


(4.33) 


(4.34) 


r 
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E  v! 


(2) 
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*48  +  *58  Pfi6 


i€£2 

16.206 


(4.35) 


E 


M. 

*7  —  4L-  ■/rr>7 
i€E2 

*  16.206 


*77  Pn  +  a87  P*7 


(4.36) 


.(2) 


J2) 

J 


«-p 


«(2) 


,(2) 


‘46  PW  +  *78  P78 


>€£2 

-  16.206 


(4.37) 


substituting  Eq.  (4.28)  to  (4.37)  into  Eq.  (4.22),  then 


80  *u  +  32.503  +32  503  wu  +32.503*.,  +  32.503  ru 

1*1^  iMg 

^  “  40  +  16.202  +  16.202  +  16.202  r„  +  16.202  * 


'  M, 


Mm 


'My 


1  Mm 


(4.38) 


The  procedure  for  obtaining  p32 
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>€£3 
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(4.40) 
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;€  £3 
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«r-  e 

jts, 

—  33.503  (4.42) 

r-  e  m 

i€Es 

-  32.503  (4.43) 

substituting  Eq.  (4.28)  to  (4.32)  and  Eq.  (4.30)  to  (4.43)  into  Eq.  (2.21),  then 

80  +  32.503  fw  +  32.503  +  32.503  *u  +  32.503 

_ *25 _ *"e _ ‘**7 _ *”l 

?32  “  80  ir.,  +  32.503  +  32.503  r.,  +  32.503  x.,  +  32.503 

ilfj  4lVy  Mg 

*  1  (4.44) 

P32  equal  to  1  implies  that  pl2  equals  zero. 

Since  class  3  is  a  trapping  class,  it  will  not  affect  the  result  if  it  is  assumed 
that  the  numerical  values  of  d,  and  P33  are  both  1.  By  summing  all  the  results 
obtained  above,  the  Laplace  transform  of  the  transition  kernel  matrix  of  the 
approximate  aggregated  Markov  process  is  as  follows  : 


!*(•) 


3 

«  +  3 


2 

•  +  2 


1 

•  ♦  1 


(4.45) 


or  in  the  scaled  time  domain, 
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0 
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(4.46) 
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By  semi-Markov  theory,  the  Laplace  transform  of  the  interval  transition  matrix  can 
be  expressed  as  follows  : 


♦  W-i.r+lz-PM]-1 


(4.47) 


where  s  is  the  Laplace  operator,  I  is  the  identity  matrix,  P  is  the  eventual 
transition  probability  matrix  and  A  is  a  diagonal  matrix  whose  i-th  element  is  the 
exponential  transition  rate  out  of  "state”  i.  So,  for  the  approximate  Markov 
process  for  the  O-state  model,  P  and  A  are  is  follows  : 


(4.48) 


3 


1 


(4.40) 


Substituting  Eq.  (4.48)  and  (4.40)  into  Eq.  (4.47)  yields,  after  some 
manipulations, 


*(•) 


l 

#  +  3 


(«  +  2)(«  +  3) 
6 

«(«  ♦  2)(«  ♦  3) 


0 

1 


(*  +  2) 
2 

«(«  ♦  2) 


(4.50) 


or  in  the  scaled  time  domain, 
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3(  «-*-  r*) 

e"*' 

0 

(4.51) 

1  -  3«-»'+  1c-*' 

» 

8 

l 

« 

1 

1 

Since  the  initial  state  probability  vector  used  in  the  exact  state  probability 
distribution  histories  calculation  was  assumed  to  be 
x(0)  *  [  1  0  0  0  0  0  0  0  0  0]r,  the  state  probability  vector  for  the 

approximate  aggregated  Markov  process  will  be, 

£*(0)  —  (1  0  0jr  (4.52) 


By  Eq.  (1.1),  the  "state”  probabilities  of  the  approximate  aggregated  Markov 
process  are, 


(4.53) 


tV)  -  3(«-a'-  «-“*)  (4.54) 

*  i 

x'U')  «  l  -  3e~2,*+  2e"3*'  (4.55) 

«• 

The  argument  t’is  used  here  in  order  to  distinguish  the  different  time  scale  used  for 
the  approximate  Markov  process.  The  original  semi-Markov  0-state  model  is 
denned  in  a  faster  time  scale,  denoted  t.  If  the  it*  (?)  are  expressed  in  this  original 
temporal  scale,  then  Eq.  (4.53)  to  (4.55)  will,  in  general,  become  : 


»'(()  —  «'3*1 

(4-56) 

»;«)  -  3 <«-J*  -  r3(<) 

(4.57) 

**(0  ■*  1  -  3e“2*  +  2e~3St 
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(4.58) 


or,  in  this  example  where  <5=f,  these  three  equations  become, 


-  «-** 

(4.59) 

wjM  -  3(e-Jd  -  <-3") 

(4.60) 

x'(t)  —  1  -  3«-2“  +  2e-3d 

(4.61) 

By  expanding  the  approximate  Markov  process  with  the  stationary 
probability  distributions  of  the  non-perturbed  decoupled  processes  obtained  in  Eq. 
(4.13)  to  Eq.  (4.15),  the  approximate  probability  distribution  for  the  Estate  model 
is, 

0.9748 
0.0145 
0.0107 
0.1250 

(<)  =  0.8820  (4.62) 

0.1788  3(  e~2ft  -  e~3tt ) 

0.0093 

0.0060 


(  1  )(1-  3«~2<t  +  2e~3<< ) 


4.3  Comparison  and  Discussion  of  Results 

The  accuracy  of  the  approximate  approach  depends  on  two  key  factors.  The 
first  factor  is  how  quickly  and  how  accurately  the  normalized  probability 
distribution  in  each  class  of  the  0-state  model  converges  to  the  non-perturbed 
stationary  probability  distribution.  The  second  factor  is  how  accurate  the  ”  state” 
probabilities  of  the  approximate  aggregated  Markov  process  are  relative  to  the  class 
probabilities  of  the  0-state  model.  The  comparison  of  results  for  the  example 
system  in  these  two  aspects  follows: 

First,  the  normalized  probability  distributions  at  the  end  of  the  mission,  i.e. 
at  t  =  3200  sec.,  obtained  in  Figure  4-1  and  the  analytical  non-perturbed 
stationary  probability  distributions  obtained  in  Section  4.2.2  are  compared  in 
Figure  4-7.  The  largest  and  the  smallest  relative  percentage  errors  occur  in  state  3 
and  5,  respectively.  The  normalized  probability  trajectories  for  states  3  and  5  are 
plotted  in  Figures  4-8  and  4-9  along  with  the  corresponding  analytical  stationary 
probability  distribution  value  (a  constant  in  each  case). 

In  Figure  4-8  the  state  probability  trajectory  in  state  3  starts  to  converge  to 
within  12%  of  the  stationary  value  from  t  =  800  sec.  onward  and  at  the  end  of 
the  mission  it  converges  to  a  value  of  0.012,  which  is  higher  than  the  stationary 
probability.  In  Figure  4-9,  the  state  5  normalized  probability  trajectory  converges 
faster  to  within  10%  of  the  stationary  probability  from  t  =  350  sec.  onward  and 
converges  to  within  0.5%  or  to  a  value  of  0.68  at  the  end  of  the  mission.  The  main 
contribution  to  the  large  percentage  error  of  the  normalized  probabilities  in  states  2 
and  3  relative  to  the  analytical  stationary  probabilities  is  due  to  the  large  step  size 
chosen  for  the  discretization  of  the  9-state  model  (step  size  was  chosen  for 
compromise  between  accuracy  and  mission  length).  To  illustrate  this  point,  the 


-75- 


l 

« 


class 

state 

normalized 

probability 

distribution 

stationary 
probability 
distribution 
(numerical  ] 

stationary 
probability 
distribution 
(analytical  ] 

relative  % 
error 

1 

0.9718 

0.9719 

0.9748 

-3.0 

1 

2 

0.0161 

0.0162 

0.0145 

11.7 

3 

0.0119 

0.0120 

0.0107 

11.7 

4 

0.1283 

0.1179 

0.1250 

2.7 

5 

0.6789 

0.6876 

0.6820 

-0.5 

2 

6 

0.1749 

0.1783 

0.1768 

-1.0 

7 

0.0101 

0.0094 

0.0093 

9.6 

8 

0.0075 

0.0069 

0.0069 

9.7 

Figure  4-7:  Comparison  of  normalized  probability  distribution  at  t=3200  sec. 
and  stationary  probability  distribution  of  the  non-perturbed  process 

stationary  state  probability  distribution  within  each  class  was  obtained  numerically 
by  running  the  9-state  model  program  for  800  time  steps  with  e=0  and 
jt(0)  =  [100100001].  The  result  is  the  numerical  stationary  probability 
distribution  of  the  non-perturbed  process  which  is  shown  in  Figure  4-7.  The  class  1 
normalized  probability  distribution  converges  to  the  numerical  stationary 
probability  distribution  rather  than  the  analytical  stationary  probability 
distribution.  From  this,  it  can  be  concluded  that  the  base-line  results  for  the 
normalized  probability  distribution  are  in  error  due  to  computational  effects  in  the 


discretization  of  the  governing  matrix  of  the  model. 


The  class  probability  trajectories  from  the  semi-Markov  approach  were 
obtained  in  Section  4.1  and  the  ” state”  probabilities  of  the  approximate  aggregat  ed 
Markov  process  in  closed  form  were  deduced  in  Section  4.2.3.  The  results  of  these 
two  different  approaches  at  40,  80,  and  800  time  steps  are  compared  in  Figure  4-10. 


t/sec. 

class 

lumerical  semi-Markov 
approach 

approximate  Markov 
process  technique 

1 

0.9988 

0.9988 

160 

2 

0.1198e-2 

0.1199e-2 

3 

0.4327 e-6 

0.4804e-6 

1 

0.9976 

0.9976 

320 

2 

0.2393e-2 

0.2395e-2 

3 

0.1728e-5 

0.1918e-2 

1 

0.9764 

0.9763 

3200 

2 

0.2346e-l 

0.2352e-l 

3 

0.1698e-3 

0.1895e-3 

Figure  4-10:  Comparison  of  class  probability  obtained  from  numerical  semi- 
Markov  approach  and  from  approximate  Markov  process  technique 

Figure  4-10  indicates  that  the  largest  absolute  error  between  the  two  results  is  only 


0.0001  in  class  1  at  the  end  of  the  mission.  This  shows  that  the  enlarged  process 
approximates  the  aggregated  probability  distribution  of  the  exact  modeL  very  well. 
With  the  inaccurate  baseline  results  taken  into  account,  the  absolute  error  of  the 
approximate  state  probability  distribution,  obtained  by  expanding  the  approximate 
Markov  process  with  the  analytical  stationary  probability  distribution  as  in  Eq. 
(4.62),  will  be  less  than  0.0000117  for  any  state  beyond  t  =  800  sec.  This  is  only 
1/2000  of  the  MTTF  of  each  instrument. 

The  high  accuracy  of  the  enlarged  process  approximation  led  to  a  closer 
examination  of  the  example  system.  Undoubtedly,  the  model  for  this  system  is  a 
"pure”  semi-Markov  model  because  none  of  the  transition  kernel  elements  has  an 
exponentially  distributed  holding  time.  If  the  states  in  each  class  are  examined 
carefully,  it  can  be  found  that  all  the  states  within  each  class  represent  the  same 
number  of  working  and  failed  instruments.  By  combinatorial  analysis,  the  time  to 
transition  from  class  1  to  class  2  is  exponentially  distributed  with  a  parameter  of 
3(.  So  the  9-state  model  class  to  class  transition  is  intrinsically  governed  by  a 
Markov  process.  Although  the  9-state  model  has  this  property,  it  is  not  non-trivial 
to  deduce  whether  the  approximate  technique  did  accurately  approximate  the  state 
probability  distribution  of  a  genuine  semi-Markov  process. 

Because  of  this  special  property  of  the  9-state  model,  the  approximate 
technique  will  be  further  tested  on  several  semi-Markov  models  in  the  next  chapter. 
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Chapter  5 

Further  Tests  of  Approximate  Technique 
with  4-State  Models 

The  approximate  technique  applied  to  the  9-state  model  was  demonstrated  in 
the  previous  chapter.  In  order  to  further  test  the  technique  for  other  models  that  a 
fault-tolerant  system  might  produce  without  expending  a  lot  of  effort  to  create 
large  state  space  models,  several  relatively  small  4-state  semi-Markov  process 
models  will  be  created  in  this  chapter  to  simulate  various  fault- tolerant  system 
class  to  class  transition  structures  and  properties,  and  to  evaluate  the  results  of  the 
approximation  technique. 

There  are  five  models  to  be  examined  in  this  chapter.  Their  detailed 
descriptions  appear  below,  but  they  will  be  summarized  here.  In  case  I,  there  are 
two  ergodic  classes  where  the  second  class  is  a  trapping  class.  Case  0  has  the 
property  that  ergodic  class  1  can  transit  to  trapping  classes  3  or  4.  The  difference 
between  this  case  and  Case  III  is  that  in  Case  IQ,  class  2  can  transit  back  to  class  1. 

The  next  example,  Case  IV,  consists  of  two  non-ergodic  classes  where  class  2  is  a 
trapping  class.  Case  V,  the  last  model,  comprises  four  classes,  where  classes  3  and 
4  can  be  entered  from  both  class  1  and  class  2. 

\ 

I 

I 


IBB 
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S.1  Cut  I 

In  this  model,  the  semi-Markov  process  consists  of  four  states,  The  state 
transition  diagram  is  shown  in  Figure  5-1. 


class  1 


class  2 


Figure  5-1:  State  transition  diagram  for  Case  I 

The  process  can  be  decomposed  into  two  classes,  namely  class  1  and  class  2,  when 
€=0.  Class  1  comprises  states  1  and  2  and  class  2  comprises  states  3  and  4.  The 
transition  from  class  1  to  class  2  is  through  the  small  eventual  transition 
probability  in  terms  of  t  from  states  i  and  2  to  states  3  and  4.  However  state  3 
and  state  4  cannot  transit  back  to  any  of  the  states  in  class  1,  hen<je  class  2  is  a 
trapping  class.  The  governing  transition  kernel  matrix  is  given  by  the  following: 


where  Xj— 0.2,  X2**0.1,  «»2.5xlOT*  (  ail  units  arc  in  see*1  ). 

It  is  assumed  the  initinl  condition  is, 

»(0)-(l  0  0  0]r  (5.2) 

One  point  nbont  this  model  to  be  emphasised  is  that  the  holding  time  density 
functions  for  the  transitions  from  states  in  class  1  to  states  in  class  2  and  those 
within  class  2  are  2nd  order  Erlang  PDFs.  These  are  non-exponential  holding  time 
density  functions,  so  the  model  is  a  semi-Markov  process.  , 

Stationary  probability  distribution  of  tbs  non- perturbed  semi-Markov 
process 

By  setting  «~0  and  dropping  all  the  holding  time  density  functions  in  the 
transition  kernel  matrix,  the  transition  probability  matrix  of  the  non-perturbed 
Markov  process  is  found  to  be  : 

0030  O' 

1  0.7  0  0  (5  3) 

0  0  04  05 

0  0  08  05 


By  raising  the  single  step  transition  probability  matrix  successively  to  higher 


power*,  the  stationary  interval  transition  probability  matrix  is  found  to  b«  : 


0.2306  0  0 

0.7603  0  0 

0  0.4545  0.4545 

0  0.5455  0.5455 

« 

Than  tk«  stationary  probability  rector*  of  the  non- perturbed  imbedded 
processes  in  class  1  and  3  are: 

—  [  0.2306  0.7603  }r  , 

—  1 0.4545  0.5455  ]r 

The  mean  waiting  times  for  the  states  in  class  1  are, 

X2 

f2  *  P12  x"  +  P22  7"  *  8  5 
X1  x2 

Therefore  the  mean  waiting  time  of  the  process  in  class  1  is 

E  *Mtri -*■*** 

« €  Ej 

Hence,  the  stationary  probabilities  in  class  1  are 


0. 
0.7603 
0 
0 


(5.4) 

Markov 

(5.541) 

(555) 

(56) 

(5.71 

(58) 


or  in  victor  Iona, 


«* l*  —  ( 0.2800  0.7901  )r  (S.  10) 

The  bnu  waiting  tim«a  for  ths  ititw  ia  class  J  are: 

i 

t 

"  IjJ  ^  ^48  ^  (511) 

*i  *a 

%  *  Pj4  +  Pa  "  15  (512) 

*1  x2 

Therefore  the  meaning  waiting  time  of  the  process  in  class  2  is, 

f  —  £  *M  f . •  —  15.4545  (5.13) 

Hence,  the  stationary  probabilities  in  class  2  are, 

a121  -  -  0.4705  (5.14a) 

3  r 

ir^  *  i—l  »  0.5205  (5.145) 

4  r 

or  in  vector  form, 

a,2)*(  0.4705  0.5205  ]r  (5.15) 

Approximate  Markov  process 


In  all  the  five  cases  in  this  chapter,  the  time  scale  factor  b  is  set  equal  to  t 
and  a  similar  approach  as  in  the  0-state  model  example  in  the  last  chapter  is  used 
for  evaluating  the  approximate  Markov  process. 

The  Laplace  transform  of  the  kernel  element  for  transition  from  aggregated 
"state"  1  to  aggregated  "state"  2  is  given  by: 


t 


*» w “ '»•  ^ 
where 


From  the  transition  kernel  matrix  in  eq.(5.1), 

♦ i-g 


Substituting  ail  the  numerical  quantities  in  eq.(5.15): 

A.  —  -9iyM«tO™W»»  «  0.83Q1 
1  0  230*  *  10  ♦  0  7M2  x  •  5 

Obviously  from  the  structure  of  the  class  to  class  transitions. 
P21  *  1 


Therefore, 


♦21(a)«-JLm 


t  +  0.9301 
or  in  the  scaled  time  domain, 


tf21(0  =  0.9391 1~°  9391 '' 

Since  there  are  only  two  classes  and  class  2  cannot  transit  to  class 
approximate  probability  in  class  2  is  given  by, 


(5.16) 

(517) 

(518) 
(5  10) 

(520) 

(5  21) 

(5  22) 

(5  23) 
l,  the 


(5.24) 


*  J 0  “ 

—  J  _  f -0.9391 1 

(5.24) 

and  the  approximate  probability  in  class  1  is, 

n  - 1  -  in 

—  e— 0.9391  f 

(5.25) 

In  the  original  time  scale,  this  becomes 

r*  (f)  —  x  2.5  x  io-«  t 

(528) 

(5  27) 

Exact  eolation  of  tli«  original  seml-Markor  process 

The  exset  solution  of  the  semi-Markov  process  is  to  be  evaluated  analytically 
by  using  eq.(4.40).  Although  there  are  only  four  states  in  the  model,  the 
manipulation  will  have  to  be  helped  by  using  a  powerful  symbolic  manipulation 
program  called  MACSYMA  which  resides  in  the  Vlultics  system  at  the 
Massachusetts  Institute  of  Technology.  Two  of  the  elements  of  the  interval 
transition  probability  matrix  are  obtained  as  follows: 

#u  (0  *  0.030115  (  e~23Se~6  ‘  -  e-°  22999815 1 1 

+  0.230748  [  «-«*“• «  -  e-°  22999815  *  J 

+  5.384780c— 7  t  e~°  1  *  4-  0.538474  e“°  1  f 

+  2.833533c— 5c~°  2  1  (5.28) 


#21  (*)  *  0.039775  ( e"2-35  e“®  *  -  e°  22990815  ‘  ] 

+  0.538533  |  e"2-35  -•  *  -  e°  22908815  *  ] 

-(  5.7603620— 7(  +  0.538483  )«“° 1  ‘  -  5.000e-5  e“°  2  *  (5.29) 

Since  the  initial  condition  was  assumed  in  Eq.  (5.2)  to  be  x  (0)  =  (  1  0  0  0  \T, 

then 

*!  (0  *  #u  (0 
X2  (t)  »  *21  (#) 

Therefore  the  total  probability  in  class  1  is, 

^(0  -  (/)  +  *2  (t)  (5.32) 

Comparison  of  results 

The  approximate  and  exact  total  probabilities  in  class  1  at  different  time 
points  are  compared  in  Figure  5-2.  The  results  indicate  that  errors  in  the 
aproximation  occur  only  at  the  fifth  decimal  places  through  the  time  history  up  to 
ts  10000  with  the  maximum  relative  percentage  error  occuring  at  t— 1000  at  value 
of  only  0.0002%.  This  shows  that  the  class  probability  is  well  approximated  by  the 
enlarged  process. 

After  the  class  probability  results  have  been  compared,  the  normalized 
probability  distribution  within  class  1  is  compared  with  the  stationary  probability 
distribution  that  is  given  in  Eq.  (5.9).  The  normalized  probability  distribution 
history  in  class  1  is  shown  in  Figure  5-3. 

By  comparing  the  stationary  normalized  probability  distribution  that  was 
established  after  100  seconds  with  the  stationary  probability  distribution  of  the 
non-perturbed  semi-Markov  process  in  class  1  obtained  in  Eq.  (5.9).  it  is  found  that 


(5.30) 

(5.31) 
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t/sec. 

approximate  class 
probability  t*  (t) 

exact  class 

probability  Pg  (t  ] 

1 

0.00009 

1.00000 

5 

0.00000 

1.00000 

10 

0.09008 

0.90000 

50 

0.90088 

0.99990 

100 

0.09077 

0.99978 

500 

0.00883 

0.00884 

1000 

0.99765 

0.99767 

0.98833 

0.98834 

10000 

0.07680 

0.97679 

Figure  5-2:  Comparison  of  approximate  and  exact  probability  in  class  1 

there  is  no  error  up  to  4  decimal  places.  This  implies  that  the  semi-Markov  process 
is  well  approximated  to  within  0.0002%  error  after  the  transient  period  of  100 
seconds  at  the  beginning  of  the  mission.  The  transient  period  is  about  10  times  the 
maximum  mean  waiting  time  among  the  states  of  the  non-perturbed  process  in 
class  1  and  0.025%  of  the  MTTF. 
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0.0925 
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0.6510 
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10 

0.4791 
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40 

0.2707 

0.7293 

100 

0.2609 

0.7391 

200 

0.2609 

0.7391 

600 

0.2609 

0.7391 

Figure  5-3:  Normalized  probability  distribution  in  class  1 


5.2  Case  U 

In  Case  I,  the  semi-Markov  process  was  well-approximated  by  the  enlarged 
process  after  the  transient  period.  However,  the  model  there  is  not  general  enough 
to  include  different  classes  that  class  I  can  transit  to,  as  is  likely  to  be  the  case  for 
many  fault-tolerant  system  models.  Ironically,  in  the  9-state  model  there  is  not  a 
class  that  can  transit  to  both  of  the  other  two  classes.  In  order  to  investigate  how 
valid  the  approximation  is,  another  model  will  be  formed.  It  consists  of  four  states 
which  decompose  into  three  classes.  Class  1  comprises  states  1  and  2;  classes  2  and 
3  are  just  states  3  and  4,  respectively.  Class  1  can  transit  to  classes  2  and  3  while 
classes  3  and  4  are  trapping  classes.  The  state  transition  diagram  is  shown  in 
Figure  5-4  and  the  process’  governing  transition  kernel  matrix  is  defined  as  follows: 

t 
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kernel  elements  for  exits  from  class  1  are  similar  to  those  in  Case  1,  the  non- 
perturbed  semi-Markov  process  stationary  probability  distribution  for  class  1  and  A 
are  the  same.  These  results  are  repeated  for  convenience  here, 


jtW  =  [  0.2809  0.7391  ]r, 

A  =  0.9391 .  (5.34) 


However,  there  will  be  different  eventual  transition  probabilities  from  aggregated 
"state”  1  to  aggregated  "states”  2  and  3.  They  are  evaluated  as  follows: 


where 


.4“=  £  4!’ 

J  6  Ey 

therefore, 


4l) 


6 

9 


and, 


«i 


,(21)  _ 


=  E 

jeE2 


■0 

j* 


(5.35) 


(5.36) 


(5.37) 


therefore, 
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(21)  _  Jl)  _ 
*31 


9',  =  9oi  = 


421) 


= ,<»  . 

"32 


substituting  these  quantities  into  Eq.  (5.35),  then 
p21  =  0.6111 

Since  class  1  can  only  transit  to  classes  2  and  3  : 

P31  =  1  ~  P21  —  0.3889 

Then  the  transition  kernel  elements  exiting  aggregated  "state”  1  are, 

<j,  (s)  =  0.6111  -JLgggJL, 

<f>~,  (a)  =  0.3889  ..P,93?*.. 

31  3  +  0.9391 

If  the  initial  state  probability  vector  is, 

l  (0)  =  [  1  0  0  of 


(5.38) 


(5.39) 


(5.40) 

(5.41) 


(5.42) 


then  the  probabilities  in  classes  2  and  3  in  scaled  time  t’will  be  approximated  by: 


7 re2  (?)  =  0.6111  (  1  -  e"09391  *']  (5.43) 

jr*  (**)  =  0.3889  [  1  -  e-0'9391  *']  (5.44) 

or,  in  the  original  time  scale, 

1 r*  ( t )  =  0.6111  [  1  -  e-0  9391  x  2  5  x  10  6  1  ]  (5.45) 

jr*  (0  =  0.3889  (  1  -  e-09391  x  2  5  x  10"6 1  ]  (5.46) 


Exact  solution  of  the  original  semi-Markov  process 

The  exact  solutions  for  $31(t)  and  #41(t)  are  evaluated  analytically  with  the 
help  of  MACSYMA.  If  the  initial  probability  distribution  is  as  in  Eq.  (5.42)  then, 


=  |W 

(5.47) 

=  *41  (') 

(5.48) 

that  is, 

PE(t)  =  3.15232c- 13  e“011500  *  [  -  1.93859el2  s»nh(0.114998f) 

-  1.93859el2  co«h(0. 114998*)  ]  -  1.02947e-6  e~0At 

+  0.61111  (5.49) 

PE  (f)  =  3.15232e-13  c~°  11500 1  [  -1.23364el2  sm/i(0. 114998*) 

3 

-  1.23383el2  coah{0. 114998  *)  -  8.77778e-6  e~0  5t 

+  0.38889  (5.50) 

It  can  be  seen  that  the  interval  transition  probability  functions  from  state  1  to 
state  3  and  from  state  1  to  state  4  are  a  sum  of  exponential  terms  despite  the  fact 
that  all  of  the  holding  time  density  functions  in  the  model  are  exponential. 

Comparison  of  results 

The  exact  and  approximate  class  probability  results  in  the  closed  form 
obtained  above  are  evaluated  and  compared  at  different  time  points  up  to  10,000 
seconds  in  Figure  5-5.  From  the  numerical  results  in  the  figure,  it  can  be  seen  that 
the  maximum  error  occurs  in  the  fifth  decimal  place  up  to  t= 10,000  seconds.  The 
approximate  class  probability  distribution  when  all  the  probability  in  class  1  has 
moved  to  classes  2  and  3,  can  be  obtained  by  substituting  Eq.  (5.45)  and  (5.46)  with 


I 
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0.00000 
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0.00713 

0.00455 

0.00454 

10000 
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0.01418 
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0.00902 

Figure  5-5:  Comparison  of  approximate  and  exact  classes  probabilities 
t=oo.  The  results  are: 


jrf  (oo)  =  0.6111 

(5.51) 

7re3  (oo)  =  0.3889 

(5.52) 

The  class  probability  distribution  at  t=oo  can  also  be  obtained  from  the 
exact  solution  in  Eq.  (5.49)  and  (5.50).  All  the  terms  in  both  equations  except  the 
constant  terms  will  vanish  when  t=co,  so  the  class  probability  distribution  will  be, 


Pe2(  oo)  =  0.6111 

(5.53) 

PE^{  oo )  =  0.3889 

(5.54) 

am«g«afflMaaag^^ 
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By  comparing  the  exact  and  approximate  results,  it  can  be  speculated  that  the 
entire  class  probability  history  is  well  approximated  by  the  approximate  Markov 
process. 

It  was  demonstrated  in  this  example  that  the  class  probability  trajectory  is 
also  well  approximated  by  the  approximate  Markov  process  for  a  particular  model 
where  a  class  can  transit  to  two  different  classes. 

5.3  Case  m 

The  O-st&te  model  and  the  models  in  Cases  I  and  0  do  not  yield  an  associated 
aggregated  model  for  which  classes  2  or  3  can  transit  back  to  class  1.  This 
situation  would  arise  in  models  of  fault-tolerant  systems  that  include  on-line  repair. 
This  provides  the  motivation  to  create  a  new  model  to  demonstrate  the  accuracy  of 
the  approximate  Markov  process  for  this  situation.  The  new  model  is  similar  to  the 
one  used  in  Case  0  in  that  class  i  can  transit  to  both  class  2  and  class  3.  However, 
class  2  can  transit  back  to  class  1  in  the  new  model.  The  state  transition  diagram 
of  this  new  model  is  shown  in  Figure  5-6.  The  transition  kernel  matrix  is  similar  to 
that  of  Case  II  except  that  there  are  transitions  possible  back  to  class  I  from  class 
2.  This  yields  two  new  nonzero  off-diagonal  elements  in  the  transition  kernel 
matrix,  which  is  defined  as  follows: 

0  (0.3  -7c)X1«“xit  2€X1e_xi£  0 

(1  -  6e)X0e-X2‘  (0.7  -  2e)\ne~X2t  4(Xne~x2l  0 

2eX3e“X3<  6cX3e-x3£  ( 1— 6e  )X3e— X3Z  0 

4cX.e“X4f  3cX  .e“X4{  0  X.e~x4f 

4  4  4 


1 
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class  1 

class  2 
class  3 


Figure  5-6:  State  transition  diagram  for  Case  III 
where  Xj*0.2,  X„*0.l,  X3*0.4,  X&a«0.5,  <*2.SxlO*®,  {  all  units  are  in  sec*1  ). 

Stationary  probability  distribution  of  the  non-parturbad  semi-Markov 
process 

The  structure  of  the  process  in  this  case  is  different  from  that  of  Case  D. 
However,  the  non-perturbed  process  in  class  l  is  exactly  the  same  as  that  of  Cases  I 
and  0.  So  the  stationary  probability  distribution  is  the  same  as  before,  namely 

=  (  0.2609  0  7391jr  (5  56) 

Since  classes  2  and  3  each  consist  of  only  one  state,  their  stationary  distribution 

are. 


-**»-!  Hr 

r<3>-[l|r 


(5  58) 


Approximate  Marker  proems 

Because  of  the  similarity  of  this  process  with  that  of  case  0,  some  of  the 
parameters  of  the  approximate  Markov  process  are  the  same,  and  they  are  stated 
here: 


p21~  0.6111 

(5.50) 

p31  -  0.3880 

(5.60) 

A,  —  0.0301 

(5.61) 

In  this  model,  however,  class  2  can  transit  back  to  class  1,  so  ,1*  is  calculated 

* 

as  follows: 


where 


,(2)  _(2) 
3  *3 
,<2>a(2) 
3  3 


j  €  E2 


Y  Pj3Tj3 


>6  E2 

*  P33A3 
=  2  5 


(562) 


Substituting  these  quantities  into  Eq.  (5  62) 
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A 2  -  6/2.5 
-2.4 

CUm  2  transits  only  to  class  1,  therefore 


(5.63) 


*  +  2.4 


(5.64) 


Then  the  approximate  aggregated  Markov  process  in  the  new  time  scale  is 
characterised  by  the  following  Laplace  transformed  transition  kernel  matrix, 


2.4 

('  +  *.4) 


08111 ,7^ 

•  03899  (t  +  o!w«l) 


or  in  the  scaled  time  domain, 


/*(f)—  0  6111  x  0.9391c”0®3®1* 

0  3889  x  0  9391e”°  93911 


If  the  initial  condition  of  the  process  is 


(5.65) 


(5.66) 


x  (0)  —  [  1  0  0  0]r 


then, 


x|(f)~#*,(0  (5  68) 

By  using  continuous  time  Markov  theory,  the  interval  transition  probability  matrix 
and  then  x*  (/)  in  the  original  time  scale  are  found  to  be:  i 

ft  « 


TVr  -TVIS-TYTS  T* 


a  •*.  r. 


•  *  \ 
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jt[  (!)  *  1.0710e-8  e"1  ®®°®  x  2-5x10"®  *  [  4.0338  sinh(1.3824  x  2.5x10"®  /) 

+  9.337 3<7  eoah( 1.3824  x  2.5x10"®  !)  J  (5.60) 

xe2  (!)  =  4.1517e— 1  e"1  ®®®®  x  2-5x10"®  t  ,inh(1.3g24  x  2.5x10“®  t )  (5.70) 

jr*  (!)  *  1.0710e— 8  e"16®8®  x  2-5x10"®  t  j  _8.8103e+7  «tnh( 

1.3824  x  2.5x10"®! )  -  9.3372e7  co*h(  1.3824  x  2.5x10"®  !) ) 

+  0.00000  (5.71) 


Exact  solution  of  th«  original  semi- Markov  process 

The  exact  solution  of  total  probability  in  each  class  in  this  example  is 
calculated  numerically  by  the  same  matrix  convolution  sum  method  ss  was  used  for 
the  0-state  model.  The  normalized  probability  distribution  in  class  1  was 
calculated  analytically  with  the  help  of  MACSYMA.  The  results  appear  below. 

Comparison  of  results 

The  results  for  the  exact  and  for  the  approximate  class  probability 
distributions  are  compared  in  Figure  5-7.  This  example  shows  again  that  the  exact 
aggregated  probability  distribution  is  well  approximated  by  the  enlarged  process 
because  the  maximum  errors  occur  only  at  the  fifth  decimal  place 

1 

The  normalised  probability  distribution  in  class  i  is  shown  in  Figure  5-8.  It 
can  be  seen  from  the  normalized  probability  distribution  history  in  Figure  5-8  that 
stationanty  is  established  after  t=200  seconds  When  this  is  compared  with  the 
stationary  probability  distribution  of  the  non-perturbed  process  in  class  1  that  is 
given  in  Eq.  (5.56),  it  is  found  that  they  agree  up  to  four  decimal  places 

It  has  been  emphasized  here  that  in  this  model,  there  are  transitions  possible 


**«’#**  »*. 
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Figure  5-7:  Comparison  of  approximate  and  exact  classes  probabilities 


state  1 


state  2 


t/aec. 

1 

state  1 

0 

10 

0.4791 

50 

0.26449 

100 

0.28089 

1 

0.26086 

. 

0  26086 

1000 

0.26086 

0.00000 

0.5209 

0.73551 

0.73911 

0.73914 

0.73914 

0.73914 


Figure  $-•:  Normalized  probability  distribution  in  class  1 

both  out  of  and  into  class  1  and  it  has  been  shown  from  the  results  above  that  the 
semi-Markov  process  is  well  approximated  in  this  case  when  the  enlarged  process  is 
expanded  by  the  stationary  probability  distribution  of  the  non-perturbed  process. 


5.4  Cnee  IV 

For  some  fault-tolerant  system  semi-Markov  models,  there  may  be  trapping 
states  among  some  classes  of  states  Under  these  circumstances,  the  ergodicity 
condition  in  the  Theorem  presented  in  Chapter  2  will  not  be  satisfied  by  these  kind 
of  models  It  is  of  interest  to  know  whether  the  approximate  technique  will  be 
valid  for  some  of  these  models.  So,  in  this  4-state  example,  a  model  with  two  non- 
ergodic  classes  is  created  where  each  class  consists  of  two  states.  The  state 
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transition  diagram  is  shown  in  Figure  5-0  and  the  process  is  governed  by  the 
transition  kernel  matrix  in  Eq.  (5.72). 


/»(*)- 


5  — 

0 

0 

0 

>  —  5«)X2e“x2* 

(1  -  0<)X2e-x2* 

0 

0 

2«X3e“x3< 

6«X3e"x3* 

0.4X3e-x3* 

0 

4«X4e“V 

3<X4«,_x4< 

0.6X4e-V 

4 

(5.72) 


where  X^O^,  X2=0.i,  X3=0.4,  Xs=0.5,  f=2.5xl0"6,  (  all  units  are  in  sec'1  ). 
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Figure  5-0:  State  transition  diagram  for  Case  IV 

Stationary  probability  distribution  or  the  non-perturbed  semi-Markov 
process 

By  decomposing  the  transition  kernel  matrix,  the  transition  probability 
matrix  of  the  non-perturbed  imbedded  Markov  process  is  as  follows: 
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found  to  be, 


ffiO 


0.5 

0  0 

0 

0.5 

1  0 

0 

(5.73) 

0 

0  0.4 

( 

1 

0 
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0  0.6 

] 

be  transition 

probability 

matrix 

to  successively  higher  powers  until 

established, 

the  stationary  interval 

transition  probability  matrix  is 

0  0 

0 

0 

1  1 

0 

0 

(5.74) 

0  0 

0 

0 

0  0 

1 

1 

Therefore,  the  stationary  probability  vectors  of  the  non-perturbed  imbedded 
Markov  processes  in  classes  1  and  2  are: 


2$  =  [  0  1  ]T  (5.75) 

-M  ~  l 0  1l1’  (5'781 

Hence,  the  stationary  probability  vectors  of  the  non-perturbed  semi-Markov 
processes  are, 

xW  =  [0  l]r  (5.77) 

I(2)  =  [0  l)r  i  (5.78) 
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Approximmte  Markov  process 

The  Laplace  transform  of  the  transition  kernel  for  transitions  from 
aggregated  "state”  1  to  "state”  2  is  given  by 

*21  W  =  ?21  7^  (5  79) 

where 


Af  -  i 


£ 

i€E, 

From  the  transition  kernel  matrix  in  Eq.  (5.72) 


(5.80) 


a<lWl)  =  9 
q2  q22 

4^  =  p22  f22  =  10 


Substituting  the  above  quantities  and  Eq.  (5.75)  into  Eq.  (5.80)  gives 


0  x  q\^  +  9 

Al  = - J - =  0.9 

0xrf*+  10 

and  from  the  structure  of  the  model, 


(5.81) 


*21 


=  1 


(5.82) 


So  the  transition  kernel  element  for  transition  from  aggregated  "state"  1  to  2  in 
new  time  scale  is, 


(5.83) 
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or  in  the  scaled  time  domain, 


02,  (f)  =  0.9 


Because  there  are  only  two  classes: 


*j(f)  =  row 


(5.84) 


In  the  original  time  scale, 


e  _ ,—0.9  x  2.5x10“°  t 


(5.85) 


(5.86) 


»,  (<) = « 


Jr*  (0  — 4  “  e~°’9  x  2  5x10 


(5.87) 


(5.88) 


Exact  solution  of  the  original  semi-Markov  process 

Exact  solutions  in  closed  form  of  the  total  probability  and  normalized 
probability  distribution  in  class  1  were  evaluated  with  the  help  of  MACSYMA  and 
they  are, 


PEX  (0  =  *1  W  +  ff2  (0 

*1  (0  =  »i  (0/^  (0 
^2  =  *2  (0 


(5.89) 

(5.90) 

(5.91) 


where, 


jfj  (<)  =  1.00339xl05  e-9-9®999*10  2  *  -  1.00338x10s  e-l-O0000*1**  1 1 

+  3.33334x10”®  e"  4  0xl0_1  *  +  7.50000x10”®  c-5-OxlO”1 1 

-  5.64401x10“®  (5.92) 

ir2  (0  =  1.54580X10”12  g-S  00014xl0“2 1  [  6.49386xl016 

coah(  4.99991X10”2  t )  -  6.49373xl018  ainh{  4.99991x10” }2  t)  J 

-  1.00382x10s  e“9"999xl°"”  *  -  1.24998x10”®  e”  4  0xl0_l  * 

-  5.62498xl0”7  e-S^IO”1 «  _  1.47474xl0”4  (5.93) 

Comparison  of  results 

The  total  probability  of  occupying  class  1  obtained  from  the  approximate 
aggregated  Markov  process  and  from  the  analytical  solution  of  the  original  semi- 
Markov  process  are  compared  in  Figure  5-10.  The  exact  and  approximate  solutions 
listed  in  the  figure  agree  to  four  decimal  places  except  after  one  million  seconds 

have  elapsed  where  the  error  occurs  in  the  fourth  decimal  place. 

The  exact  normalized  probability  distribution  history  within  class  1  is  shown 
in  Figure  5-11.  It  can  be  seen  from  the  results  in  the  figure  that  the  stationary 
normalized  probability  distribution  agrees  with  the  stationary  probability 
distribution  of  the  non-perturbed  process  in  class  1.  The  trajectory  converges 
within  less  than  0.0003  absolute  error  after  t=100  seconds. 

This  example,  which  consists  of  two  non-ergodic  classes,  shows  that  the 
original  process  aggregated  probability  distribution  history  is  well  approximated  by 
the  approximate  process  and  that  the  normalized  probability  distribution  in  class  1 
converges  to  the  stationary  probability  distribution  of  the  non-perturbed  process 
after  a  brief  transient  period,  even  though  this  model  violates  the  sufficient 
condition  stated  in  references  [4,  5]  that  the  non-perturbed  classes  be  ergodic.  The 
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Figure  5-10:  Comparison  of  approximate  and  exact  probability  in  class  1 


t/sec. 

state  1 

state  2 

10 

0.55183 

0.44817 

100 

0.00027 

0.99973 

200 

0.00000 

1.00000 

500 

0.00000 

1.00000 

Figure  5-11:  Normalized  probability  distribution  in  class  l 
implication  of  this  example  is  that  some  non-ergodic  models  can  be  analyzed  with 
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the  approximation  technique.  This  opens  a  wider  scope  of  fault-tolerant  system 
models  to  be  approximately  analyzed  by  this  technique. 

5.5  Case  V 

The  example  model  in  this  case,  the  last  in  this  chapter,  comprises  four 
classes,  and  both  classes  1  and  2  can  transit  to  class  3.  This  situation  is  found  in 
none  of  the  models  examined  before.  In  this  section,  only  the  exact  and 
approximate  probability  in  class  3  will  be  examined.  The  state  transition  diagram 
is  shown  in  Figure  5-12  and  the  process  is  characterized  by  the  following  transition 
kernel  matrix: 


(1  —  60€)XjC“xle  0  0  0 

0  (1  -  9€)X0e~V  0  0 

"  (5.94) 

20cX3c_X3t  6eX3e_y  X3e“x3*  0 

40cX4e-V  3€X4e"V  0  X4e~V 


where  Xj=0.2,  Xo=0.1,  X3=0.4,  X&=0.5,  £=2.5xl0'6.  (  all  units  are  in  sec*1  ) 
States  1,  2,  3  and  4  are  classes  1,  2,  3  and  4,  respectively.  Class  1  and  2  both 
transit  to  class  3. 

Approximate  aggregated  Markov  process 

The  Laplace  transform  of  the  transition  kernel  elements  for  transitions  from 
aggregated  "state”  1  to  3  and  from  aggregated  "state"  2  to  3  are  both  given  by: 

<t>3k  =  P3k  — k~r  'k  =  1,2  (5-95) 

From  the  parameters  of  the  transition  kernel  matrix  in  Eq.  (5.94),  the  following 


-no. 


♦f  - « -  « 

r-» 

^  *  ?22  f22  *  10 

Therefore, 

Aq  =*  !l.  =  0  9 
^  Jfl) 

2 

(5.98) 

and 

(32) 

P32  *  ^  =  0  6867 

92 

(5.99) 

So  the  kernel  elements  for  transitions  to  "state” 

3  of  the  approximate  aggregated 

Markov  process  are: 

(>l  -  «  “33  -03_ 

(5.100) 

032  (.)-  0.6667  -2i_ 

(5.101) 

or,  in  the  scaled  time  domain, 


4>u  (f)  =  0.3333  x  12  e~l2t‘ 


<t>n  (f*)  =  0.6667  x  0.9  e 


-0.9 1' 


(5.102) 

(5.103) 


If  the  initial  state  probability  vector  is, 


Tre  (0)  =  ( 0.5  0.5  0  0]: 


(5.104) 


then  the  probability  in  class  3  is. 


P 
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xe  (Z*)  —  0.5  x  0.3333  (  1  -  e~l2t>)  +  0.5  x  0.6667  (  1  -  e~°  9  *')  (5. 105) 

or  in  the  original  time  scale, 

(Z)  =  10.5  x  0.3333  (  1  -  «~12  x  2Sxl0~6 « ) 

+  0.5  x  0.6667  (  1  -  e-0«x2  5xl0-6t  j  (5.106) 

Exact  eolation  of  the  original  semi-Markov  process 

The  exact  solution  for  the  probability  in  class  3  is  evaluated  analytically  with 
the  help  of  MACSYMA  and  the  result  is  : 

P£  (/)  =  0.5  [  -6.66655X10*1  «"2  25xl0“6 1  -  M2501xl0"5  e~4  0x10-1 1 
3 

+  6.66687xl0-1  ]  +  0.5  (-3.33308xl0-1  c“3  0x10-5  t 
-  2.50019xl0~5  e-4  0x10-1 1  +  3.33333x10" 1  (5.107) 


Comparison  of  results 

The  approximate  and  exact  probabilities  in  class  3  are  compared  in  Figure 
5-13.  It  can  be  concluded  from  the  comparison  of  results  in  the  figure  that  the 
approximate  aggregated  Markov  process  approximates  well  the  behavior  of  a  model 
in  which  two  classes  can  both  transit  to  a  single  trapping  class. 
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Figure  5-13:  Comparison  of  approximate  and  exact  probability  in  class  3 

5.0  Closure 

In  this  chapter,  five  models  were  created  and  the  approximte  Markov  process 
technique  were  further  tested  beyond  the  9-state  model.  The  five  different  cases 
represent  a  range  of  class  to  class  transition  structures  which  include  transitions 
from  ergodic  class  to  ergodic  class,  transitions  from  one  class  to  two  different 
classes,  two-way  communicating  classes,  transitions  from  two  different  classes  to  a 
single  class,  and  non-ergodic  classes.  All  the  approximate  aggregated  Markov 
processes  in  these  five  cases  characterized  the  behavior  of  the  exact  aggregated  4- 
state  models  very  well.  That  is,  the  class  probability  distributions  were  well 
approximated  by  the  approximate  Markov  processes.  Furthermore,  the  normalized 


j 
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probability  distribution  trajectories  converged  almost  exactly  to  the  stationary 
probability  distribution  of  the  corresponding  non-perturbed  semi-Markov  process 
after  a  brief  transient  period. 

In  conclusion,  for  these  five  models,  the  state  probability  distributions  can  be 
well  approximated  by  expanding  the  enlarged  processes  results  with  the  stationary 
probability  distributions  of  the  non-perturbed  processes.  It  is  speculated  that  in 
general  the  approximate  technique  work  well  for  most  fault-tolerant  system  models. 
However,  there  are  some  limitations  for  these  results  to  be  applied  to  certain  types 
of  system  models.  These  shortcomings  will  be  discussed  in  the  Chapter  7. 


i.n.rMV’ 
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Chapter  6 

Relaxation  of  Ergodicity  Condition 

The  second  sufficient  condition  stated  in  Chapter  2  for  the  approximate 
Markov  process  to  be  non-trivial  is  that  the  imbedded  Markov  chain  of  the  non- 
perturbed  process  within  each  class  must  be  ergodic.  However,  it  was  shown  in 
Case  IV  in  the  last  chapter  that  both  elements  of  the  approximate  results  can  be 
valid  even  when  the  non-perturbed  processes  are  both  non-ergodic.  Although  the 
classes  for  Case  IV  are  non-ergodic,  the  stationary  probability  distribution  could  be 

i 

found  for  both  classes  and  they  are  unique.  This  led  to  further  investigation  of  the 
sufficient  conditions  for  the  semi-Markov  processes  to  be  approximated  by  the 
approximate  technique  and  the  result  is  that  Korolyuk’s  Theorem  can  be  modified 
as  follows: 

Theorem:  If  a  semi-Markov  process  depends  on  a  small  parameter  e 
such  that  its  state  space  can  be  partitioned  according  to  Eq.  (2.5)  and  is 
time-scaled  according  to  Eq.  (2.7)  and  additionally  if  the  transition 
probability  operators  Pk  for  the  imbedded  Markov  process  of  the  k-th 
class  of  the  non-perturbed  semi-Markov  process  satisfies: 

lim  =[e  «...«  1  (6.1) 

n  — *  oo  n 

1-1 

Then  the  aggregated  semi-Markov  process  can  be  approximated  by  the 
enlarged  process  defined  by  Eq.  (2.19). 

Proof:  The  proof  follows  an  identical  line  of  reasoning  to  the  proof  in 
Chapter  2  until  the  point  where  the  functions  <£>^(s)  are  shown  to  satisfy 


the  system  Eq.(2.1A).  The  system  equations  can  be  rewritten  in  linear 
equation  vector  form: 

4*  Mr- (•■*> 

Postmultiplying  the  above  equation  by  the  operator  Pk  and  using  Eq. 

(6.2)  on  the  result  gives: 

*r*Wr=4r*(*>I>*  (83) 

By  successively  postmultiplying  the  system  of  equations  and  replacing 
the  left  hand  side  by  £rk(s)T,  and  averaging  an  infinite  number  of  these 
equations: 

£r*Wr=4rtMIT  «">  lY,  Pil  (8  4) 

n  -*  oo  n 

Since  the  operator  Pk  defined  by  p!^  satisfies  Eq.  (6.1),  then,  by  linear 
equation  theory,  the  solution  of  the  system  of  equations  in  Eq.  (6.4)  is 
independent  of  the  superscript,  that  is: 

*!*<•)“**<*>  •>  €  Ek  (6.5) 

The  remainder  of  the  proof  that  the  aggregated  model  is  Markovian  and 
the  derivation  of  parameters  of  the  approximate  Markov  process  will  be 
exactly  the  same  as  that  of  the  remainder  of  the  proof  in  Chapter  2. 

This  extended  Theorem  is  a  relaxation  of  the  ergodicity  sufficient  condition 
stated  earlier  in  Chapter  2  imposed  on  the  semi-Markov  process  to  be 


approximated. 

It  is  of  interest  to  find  conditions  under  which  Eq.  (6.1)  is  satisfied.  Along 
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these  lines,  the  following  theorem  is  established: 

Theorem  : 

(1)  If  the  imbedded  Markov  process,  which  is  defined  by  the  transition 
operator  Pk  of  the  k-th  class  of  the  non-perturbed  semi-Markov  process 
is  ergodic, 

or 

(2)  If  it  is  nonergodic  with  one  and  only  one  eigenvalue  of  unity, 
then  the  operator  Pk  satisfies  Eq.  (6.1). 

Proof : 

(1)  By  the  ergodic  Theorem, 

lim  Plk  =  IIk  *  (  e  e  .  .  .  e  j  (6.6) 

l  —  OO 

and, 


lim  k±p[-  lim  P[ ) 

n  —  oo  n  —  oo  nr 


where  r  is  finite  but  large  such  that, 


f-r+l 


p>nk 


Therefore,  Eq.  (8.7)  can  be  reduced  to 


Um  =  lim  707  E  p‘t 

n  -*  oo  71  n  —  oo  n  r 


/-I 

By  Eq.  (6.8),  it  follows: 


/-r+l 


lim  ±TPlk  =  nk  =  \ee...e, 


n  —  oo 


i-1 


(6.7) 


(6.8) 


(6.9) 
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which  proves  the  Theorem  for  this  case. 

(2)  The  operator  Pk  can  be  put  in  Jordan  form  by  the  following 
transformation: 

Pk=*TAkT~x  (8.10) 

where  T  is  a  square  invertible  matrix  with  columns  made  up  of  the  right 
eigenvectors  (or  generalized  right  eigenvectors)  of  the  operator  Pk.  By  a 
proper  ordering,  Ak  has  the  form: 


where  (Xj,  ...  ,Xp}  are  the  unit  magnitude  eigenvalues  and  J  is  a  Jordan 
form  matrix  containing  all  the  eigenvalues  of  less  than  unit  magnitude 
on  its  main  diagonal.  (This  form  is  known  to  exist  for  a  stochastic 
matrix  Pk  because  the  unit  magnitude  eigenvalues  must  have  a  full  set  of 
linearly  independant  eigenvectors.)  Therefore: 


t=l 


(6.11) 

Since  ^lk  has  one  and  only  one  eigenvalue  of  one: 
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n  , 

lim  A  =  diagonal  matrix  with  a  single  non-zero 


element  of  unity  on  its  main  diagonal 


because  lim  iyv  .  =0  and  lim  ,  X*  =  0  if  X.  =1  and 

n  -*  oo  n  -►  oo 

X  •  5^  1.  Because  is  a  stochastic  matrix,  the  left  eigenvector  appearing 
in  the  row  of  T1  corresponding  to  the  unit  eigenvalue  is  [  1  ]T  . 


Therefore: 


lim  Al]r  1 

i«l 

Therefore  : 


(6.12) 


r[  lim  if;  iUr-' 

.  n  -*  oo  n 


.  1  .  .  .  0  ]r=  [  e  e  .  .  .  e 


(6.13) 


That  is, 


lim  iy;  Pk=TAlkT~l  =  \e  e  .  .  .  e  ] 


(6.14) 


n  oo 


which  completes  the  proof. 


As  an  illustration  of  the  implication  of  the  sufficient  condition  stated  in  the 


second  Theorem,  valid  and  invalid  examples  of  state  transition  structures  are  in 
shown  Figure  6-2.  Note  that  one  of  the  valid  examples  in  Figure  6-2a  includes 
periodic  intraclass  behavior.  The  invalid  example  in  Figure  6-2b  has  2  eigenvalues 
of  one  because  2  trapping  sets  are  present  in  single  class. 


As  a  result,  fault-tolerant  system  models  with  non-ergodic  classes  that  satisfy 


■uwiwnwwiw 


j  1  to  and  from  Ej  ,  Ei  £  E  and  i  5^  j 
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1  '  1 

to  and  from  E{  ,  Ei  €  E  and  i  ^  j 
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to  and  from  Ej  ,  Ei  G  E  and  i  ^  j 


Figure  6-2a:  Valid  non-ergodic  classes 
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to  and  from  Ej  ,  E{  £  E  and  i  ^  j 


Figure:  6-2b:  Invalid  non-ergodic  class 


the  condition  stated  in  Eq.  (6.1)  will  be  approximated  well  by  the  approximation 
technique  developed  in  this  thesis.  This  explains  why  the  approximate  solution  is 


valid  for  Case  IV  in  Chapter  5.  Note  that  there  may  exist  fault-tolerant  system 


models  with  non-ergodic  classes  in  forms  which  do  not  satisfy  Eq.  (6.1)  that  may 


also  be  treated  by  the  approximation  technique  because  the  Theorem  is  a  sufficient 


but  not  necessary  condition. 


Chapter  7 

Some  Limitations 
on  the  Approximate  technique 


The  approximate  technique  were  demonstrated  to  be  successful  for  the  9-state 
model  and  for  five  4-state  models  in  Chapters  4  and  5,  respectively.  The  two 
elements  that  comprise  the  approximate  technique,  namely  the  enlarged  process 
and  the  stationary  probability  distribution  within  each  class,  are  valid  for  these 
examples.  However,  there  are  limitations  for  the  approximate  technique  to  be 
applied  to  certain  types  of  system  models.  These  limitations  will  be  discussed  in 
this  chapter. 

In  the  derivation  of  the  approximate  Markov  process  in  Chapter  2,  the  limit 
of  e  was  taken  to  zero  for  the  aggregated  semi-Markov  model  to  behave  as  a 
Markov  process.  This  means  that  the  failure  rate  of  the  instruments  of  a  fault- 
tolerant  system  have  to  be  small  enough  for  the  results  to  be  well  approximated  by 
the  enlarged  process.  One  of  the  alternatives  for  investigating  the  Values  of  e  for 
which  the  approximate  Markov  process  diverges  from  the  original  aggregated  semi- 
Markov  model  is  numerical  methods.  The  small  parameter  e  of  the  model  in  Case  I 
in  Chapter  5  was  varied  and  the  enlarged  process  "state”  probability  history  was 
compared  with  the  original  semi-Markov  model  class  probability  history.  The 
largest  absolute  error  in  the  class  1  probability  history  obtained  from  the  enlarged 
process  is  shown  in  Figure  7-1.  It  can  be  seen  that  the  approximate  Markov 
process  starts  to  diverge  when  e  reaches  the  value  2.5xl0'2,  which  is  one  fourth  of 
the  slowest  transition  rate  within  class  I.  The  implication  of  this  result  is  that  the 
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Figure  7-1:  Largest  absolute  error  in  class  1  probability  history  obtained 
from  the  enlarged  process  for  the  model  in  Case  I 

systems  to  be  approximated  must  have  a  small  failure  rate  or  small  perturbation 
parameter  relative  to  the  transition  rates  within  each  class.  If  the  result  in  this 
particular  case  can  be  extrapolated  to  other  cases,  then  e  cannot  be  larger  than  1 
order  of  magnitude  smaller  than  the  slowest  intraclass  transition  rate. 

In  semi-Markov  process  models,  the  classes  are  often  defined  by  the  number 
of  working  and  failed  instruments.  Occasionally,  a  system’s  system  loss  state  is 
defined  by  different  numbers  of  failed  instruments,  e.g.  one  wrong  isolation  may  be 
as  catastrophic  as  two  uncovered  failures.  In  these  cases,  the  system  model  will 
contain  two  or  more  non-ergodic  classes.  These  non-ergodic  classes  may  not  satisfy 
the  relaxed  sufficient  condition  defined  by  Eq.  (6.1)  and  failure  of  the  approximate 
technique  may  result. 

The  restrictions  mentioned  in  this  chapter  limit  the  class  of  fault-tolerant 
system  models  to  which  the  approximate  technique  can  be  applied.  Note  however 
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that  for  a  broad  class  of  models,  the  relaxed  sufficient  condition  is  satisfied  and  the 
validity  of  the  approximate  results  is  assured  if  €  is  small  enough. 


Chapter  8 

Summary,  Conclusion  and  Suggestions 
for  Further  Research 

8.1  Summary  of  Thesis 

Semi-Markov  models  of  large  fault-tolerant  systems  whose  redundancy 
management  scheme  employs  sequential  tests  are  usually  intractable  to  practically 
obtain  the  desired  length  of  state  probability  distribution  histories  due  to  the  high 
computational  cost.  New  methods  to  evaluate  the  state  probability  history  of  such 
systems  in  an  efficient  way  are  needed  because  of  the  growing  use  of  complex  fault- 
tolerant  system  designs. 

This  thesis  has  developed  an  approximate  technique  based  on  enlarged  semi- 
Markov  theory  for  assessing  the  state  probability  distribution  histories  of  models  of 
fault-tolerant  systems  that  employ  sequential  tests  in  their  fault  detection  and 
identification  logic.  Emphasis  was  placed  on  the  extension  of  the  theory  to  fault- 
tolerant  system  semi-Markov  models.  Secondary  emphasis  was  placed  on  the 
demonstration  of  accuracy  of  the  two  elements  of  the  approximate  results,  which 
involve  expanding  the  enlarged  process  by  stationary  probability  distributions.  The 
use  and  accuracy  was  examined  for  a  9-state  model  and  for  various  class  to  class 
structures  that  mimic  fault-tolerant  system  models.  An  extended  theorem,  with 
the  relaxation  of  the  conditions  that  a  fault-tolerant  system  model  must  satisfy  for 
it  to  be  approximated  by  the  enlarged  process,  has  been  presented.  Also,  the 
limitations  of  the  approximate  technique  to  certain  types  of  fault-tolerant  systems 


was  discussed. 


8.2  Conclusions  and  Contributions 

The  approximate  technique  developed  in  this  thesis  can  be  used  to  quantify 
the  performanceof  those  fault-tolerant  systems  with  component  failure  rates  small 
relative  to  the  fault  detection  and  isolation  decision  rates.  This  thesis  has  shown 
that  the  approximate  technique  can  be  a  practical  tool  to  simplify  the 
quantification  of  large  complex  fault-tolerant  system  performance  and  might  also  be 
an  efficient  tool  in  the  synthesis  of  such  system  designs. 

The  contributions  of  this  thesis  can  be  summarized  as  follows: 

(1)  Korolyuk’s  limit  Theorem  was  extended  by  generalizing  the  form  that 
the  transition  kernel  elements  may  take,  in  which  they  depend  through 
the  holding  time  distribution  on  a  time  scale  factor  6  in  addition  to 
depending  on  the  small  parameter  e  that  divides  the  state  space  of  the 
system  into  classes.  An  approximate  technique  based  on  this  extended 
Theorem  was  then  presented,  by  which  the  state  probability  history  of  a 
fault-tolerant  system  semi-Markov  model  can  be  approximated  by 
expanding  a  reduced  order  Markov  process  state  probability  history  by 
the  stationary  probability  distributions  of  the  non-perturbed  processes 
within  the  disjoint  classes.  The  direct  benefit  of  this  approximate 
technique  is  the  reduction  of  the  computational  cost  of  generating 
results.  Therefore,  models  of  large  complex  fault-tolerant  systems 
become  tractable. 

(2)  The  approximate  technique  has  been  presented  here,  primarily  in 
Chapters  3  and  4,  in  such  a  way  so  as  to  illustrate  its  usage  from  the 


construction  of  a  9-state  semi-Markov  fault-tolerant  system  model  to  the 
evaluation  of  the  approximate  solution  for  this  model.  Thus,  the 
material  in  these  two  chapters  provides  an  outline  of  the  general 
procedures  to  be  followed  in  approximating  the  behavior  of  many  fault- 
tolerant  system  semi-Markov  models.  In  addition,  approximate  results 
for  five  cases  of  different  class  to  class  transition  structures  for  fault- 
tolerant  system  models  were  examined  where  one  of  these  models 
contains  two  non-ergodic  classes. 

(3)  Preliminary  results  were  obtained  for  the  effect  of  increasingly  large  e  on 
the  error  of  the  approximate  technique. 

(4)  An  extended  theorem  with  the  relaxation  of  the  ergodicity  condition 
stated  in  Korolyuk’s  original  work  was  presented  and  proved  in  Chapter 
6.  As  a  result,  the  approximate  technique  can  be  applied  to  a  wider 
scope  of  fault-tolerant  system  models  which  includes  those  with  certain 
types  of  non-ergodic  classes.  Another  theorem  also  presented  in  Chapter 
6,  establishes  properties  of  the  transition  probability  operator  Pk  of  the 
imbedded  Markov  process  for  class  k  within  the  non-perturbed  semi- 
Markov  process  which  imply  satisfaction  of  the  relaxed  sufficient 
condition. 

8.3  Suggestions  for  Further  Work 

The  results  of  the  approximate  technique  and  the  limitations  of  it  suggest 
possible  areas  to  which  further  consideration  might  be  given.  Some  of  these  will  be 
listed  below. 

1.  The  realistic  model  that  the  approximate  technique  was  applied  to  in 
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this  thesis  is  the  9-state  mode!  described  in  Chapter  3.  One  of  the 
assumptions  in  the  model  construction  is  that  the  failure  rates  of  all 
three  instruments  are  the  same.  However,  in  more  complex  systems 
there  might  be  several  types  of  instruments  and  each  one  of  these  may 
have  a  different  small  failure  rate.  Then  the  models  of  such  systems 
would  involve  more  than  one  perturbation  parameter.  The  construction 
of  an  approximate  technique  for  such  systems  deserves  investigation. 

2.  There  may  be  situations  for  which  the  semi-Markov  model  of  a  fault- 
tolerant  system  may  be  characterized  by  several  different  orders  of  mean 
time  to  transition  between  states.  This  may  arise  when  the  false  alarm 
rate  or  repair  rate  is  much  slower  than  the  fault  detection  and  isolation 
decision  rate  or  the  self-test  decision  rate  but  is  still  much  higher  than 
the  failure  rate  of  the  instruments.  This  gives  room  for  the  investigation 
of  accuracy  and  convergence  of  the  approximate  solution  for  models  with 
different  combinations  of  relative  order  of  perturbation  parameters  and 
two  or  more  different  orders  of  mean  holding  time  distributions  for 
transitions  between  states. 

3.  The  ergodicity  condition  within  Korolyuk’s  Theorem  was  relaxed  in 
Chapter  8,  as  a  result  a  wider  class  of  fault-tolerant  system  models  can 
be  approximated  by  the  approximate  technique,  but  it  is  of  interest  to 
know  how  many  fault-tolerant  system  models  in  real  situations  fall  into 
the  category  of  models  that  do  not  satisfy  this  relaxed  condition.  The 
versatility  of  the  approximate  technique  can  be  better  understood  if  the 
transition  structures  of  general  fault-tolerant  system  are  better  known. 

4.  In  reference  [5],  the  proof  of  a  limit  Theorem  for  semi-Markov  processes, 
from  which  the  enlarged  process  is  deduced,  depends  explicitly  on  the 
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existence  for  each  class  Ek  of  the  inverse  operator  [I-P^-h/7^]*1  where, 

1=  identity  operator 

Pk=transition  probability  operator  for  the  imbedded  Markov  process  of 
the  non-perturbed  semi-Markov 

"k  =cesaro  limit  of  the  multiple  step  transition  operator  associated  with 

Pk 

As  is  stated  in  [5],  if  Ek  is  an  ergodic  class  when  £=0  then  [I-Pk+/7k]'1  is 
guaranteed  to  exist.  Hence,  the  ergodicity  of  Ek  is  a  sufficient  condition 
for  the  existence  of  [I-Pk+/7k]*1  which  in  turn  is  a  sufficient  condition  for 
the  Theorem.  However,  ergodicity  is  not  necessary  for  the  existence  of 
the  inverse  operator.  That  is,  ergodicity  is  not  necessary  for  the 
enlarged  process  to  be  valid  and  this  was  proved  in  the  Theorem 
presented  in  Chapter  6.  Further  understanding  of  this  inverse  operator 
and  the  relationship  with  the  relaxed  condition  may  lead  to  further 
relaxation  of  the  conditions  for  applying  the  approximate  technique. 
This  would  allow  application  of  these  results  to  a  even  wider  class  of 
fault-tolerant  system  models. 

5.  The  effect  of  nonzero  £  on  the  error  of  the  approximate  results  for  case  I 
of  Chapter  5  was  examined  in  Chapter  6.  This  provides  some  insight 
into  the  accuracy  of  the  approximate  solution  with  different  £  for  that 
particular  example.  However,  this  needs  further  investigation  for  other 
more  general  system  models. 

8.  MACSYMA  is  a  powerful  symbolic  manipulation  tool  and  is  also  a 
numerical  evaluation  software  package.  Perhaps  the  ultimate 
application  of  the  approximate  technique  developed  here  would  be  to 


i 


-129- 


develop  a  MACSYMA  command  "program”  that  will  input  the 
transition  kernel  matrix  of  a  fault-tolerant  system  model  and  evaluate  all 
the  non-perturbed  processes  stationary  probability  distributions  and 
enlarged  process  state  probability  distribution  histories  in  order  to 
directly  generate  the  approximate  state  probability  histories.  This 
package  would  greatly  reduce  the  time  required  for  reliability  engineers 
to  design  or  to  optimize  the  parameters  of  complex  fault-tolerant 
systems. 
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Appendix  A 

Interval  Transition  Probability  Matrix  of 
Semi-Markov  Processes 

A.1  Interval  Transition  Probability  of  Discrete  Parameter  Semi-Markov 
Processes 

The  following  material  follows  that  of  [3j. 

Let  a  time-invariant  finite  state  discrete  parameter  semi-Markov  process  be 
characterized  by  the  transition  kernel  elements  defined  by, 

Pa  <™> - ha  <m> 

Pr  {  transition  i  — *•  j  occurs  at  sample  m  | 

state  i  entered  at  sample  0}  (Al) 

The  first  step  to  derive  the  interval  transition  probabilities  is  to  consider  the 
waiting  time  for  each  state,  which  is  the  length  of  time  spent  in  a  state  following 
its  entrance  before  a  transition  occurs  to  the  same  state  or  to  a  different  state.  In 
mathematical  terms,  if 

w-  (m)  =  Pr  {  waiting  time  =  m  |  enter  i  at  0  ]  (A.2) 

then, 

N 

wi  M  =  £  Pji  hji  (m)  M-3) 

j-i 

In  addition,  if  >w.(n)  denotes  the  waiting  time  in  state  i  is  greater  than  n  samples, 
*  • 

then 


i u 


3 


>w{  (n)  =  wi 


(A4) 


m»n+l 


Now  let  ^jj(n)  is  defined  as  the  probability  that  the  discrete-time  semi-Markov 
process  will  be  in  state  j  at  time  n  given  that  it  entered  state  i  at  time  zero.  Then 
by  considering  the  possible  ways  that  the  process  that  started  by  entering  state  i  at 
time  zero  ends  up  in  state  j  at  time  n,  the  following  equation  is  reached, 


<>ji(n)^Sji>wi(n)+  £  pki  £  <t>jk(n-m)hki(m) 


k^l  m»0 


i  =  1,2, ...N  ;  j  =  1,2,...N  ;  n  =  0,1,2,... 


J..-/1  <  =  ;' 
J‘  lO  i  ^  j 


(A.  5) 


This  equation  can  be  placed  in  matrix  form,  if  the  following  notation  is  adopted, 


VK(m)  =  {  6j{  w{  (m)  },  >W(n)  =  {  6j{  >w(  (n)  }, 


{  PaH  (m)  };V  =  pJ{  hj{(m). 


(A.  6) 


Then  by  interchanging  the  order  of  summation,  Eq.  (A.5)  can  be  rewritten  as, 


$  (n)  =  >W(n)  +  £  $  (n-m)  (  Fo/f(m)  j  ,  <P  (0)  =  / 


(.4.7) 


A.2  Interval  Transition  Probability  Matrix  of  Continuous  Parameters 
Semi-Markov  Processes 


In  the  continuous  parameter  case,  let  the  semi-Markov  process  be 
characterized  by  the  transition  kernel  element  defined  by, 


P.v(0  =  P.v*. v(0 


(A.S) 


rv 


'»vir 
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and  the  waiting  time  and  waiting  time  greater  than  t  are  defined  by, 


PjihjiW 


>w.(t)  =  j  wi  (r)  dr 


(A.Q) 


(A  10) 


Then  by  similar  lines  of  reasoning  to  the  derivation  in  A.l,  the  continuous 
parameter  interval  transition  probability  can  be  expressed  as, 


tji  (0  =  6ji  +  £  Pki  f  tjk  (*  ~  r)  hki dT 

J  0 


1  ; 

Jt  lo  i  jtz  j 


i  —  1,2, ...N  ;  j  —  1,2,...N  ;  f  >  0 

(All) 


or  in  matrix  form, 


0  (/)  =  >W{t)  +  dT  <P  (t  -  T)  (  P°H{t)  ]  ,  0(0)  =  I 

Jo 


(A  12) 


5*5 
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Appendix  B 

Characteristic  of  2nd  Order  Erlang 
Probability  Density  Function 


An  Erlang  random  variable  T  of  order  2  is  characterized  by  the  following 
probability  density  function: 


/t  (0 


X2  t  e~u  ,  t  >  0 
,  otherwise 

and  a  typical  sketch  of  this  function  looks  like  the  following, 


(B.l) 


The  function  has  the  following  characteristics: 

T-,  expected  value  of  the  random  variable  T,  is  given  by: 
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t  =  e\t\ 

=  f  t  dt 

Jo 
_  2 
X 


IB.  2) 


t  , 


the  time  at  which  the  function  has  a  maximum  value,  can  be  evaluated 


by 


differentiating  the  function  once  and  setting  the  result  equal  to  zero, 


i/=X2<Txt{l-X/}  =  0 
dt 


Therefore, 


The  cumulative  probability  up  to  time  t  is, 


(B.  3) 


rt 

Pr  {  T  <  t*  }  —  /  fT(t)dt 
Jo 

=  shaded  area 
=  0.2642 


(B.3) 


Ik*,,*-*  " 


V-V-V-V-V-.-.'  V- 
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Appendix  C 

Transition  Kernel  Elements  of  the  9-State 
Model 

Numerical  values  of  parameters  for  the  transition  kernel  elements: 

Xq  =  .001  —  0.05  \pQ  —  0.1 

Xj  =  0.05  Xm  =  0.1  Xn  =  0.05 

Transition  kernel  elements  for  transitions  within  class  1:  The  and 

q-  are  defined  by  Eq.  (2.2),  and  a-  are  defined  by  Eq.  (2.26).  The  remaining 
quantities  are: 

p,1(()  =  X*(e-|xo  +  3<li 

{  1  -  Si  }  (  X0  +  3e  )2  (  e_(x0  +  3<l  1 

xo 

P21  =  1 

q2l  =  6000 

a21  =  2000 

Poo  (0  =  xwo t  ( Vi 1  +  X)  e_(X^°  +  Vl  +  3f ) f 

2  2 
2  VoXWl  _f  18XW0  Vl  1 

'  u  ^  ^ 

(xW0  +  XW'l)  (XW0  +  XWl) 

±(  Vo  + Vi  +  3t  )3  t2  e_(xVK0  +  xW'i  +  3f)t 

2  2 

+  (  Vo  6Vo  I 

(x^o  +  x^i)“  (x^o  +  xwV 

(  VQ  +  Vl  +  3f  )2  1  e_(  Vo  +  Vl  +  3() 1 
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A 

1 


**! 

I 

•  * 


I 

i:S 

I 

1 

1 

1 

$ 

* 

v 

M 


$ 

V 


irWWW\»aVW\.%\A 


=  0.148 


=  0.111 


P22  “ 
?22t  = 

?220  = 


=  0.259 


8.888 


=  4.444 


?22  — 


=  13.333 


=  13.333 


p32  (0  =  X^i  *  (Vo  *  +  e‘(>Vo  +  ^1  +  3£)‘ 

2  2 
^  r  2  V 1  Vo  _€  18  Vl  Vo  | 

'  3  4  J 

(Vo  +  Vl)  (Vo  +  Vl) 

I(  X^-l-  X^  +  3*:  )3  t 2  e~(Vo  + Vl+3f>‘ 


(Vo  +  XwV"  (Vo  +  Vl) 

(  X^q  +  \w j  +  3f  )2  t  e~(XW0  +  Vi+3t)( 


p32^  =  0.296 
p322  =  0  444 

P3  2  =  °-740 
?321  =  17.778 

?322  =  17  778 

<?32  =  35.556 

a32j  =  20 

=  13.333 

a“2 

p13  (^  =  p32  (^ 


p23  (^  —  p22 
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Transition  kernel  elements  for  transitions  from  class  1  to  class  2: 

(0  =  3  £  (X0  /  +  1)  +  3')  ‘ 

=  e  J.h.  (  X0  +  3<  )2  (  e-<x0  +  3'»  ‘ 

(  X0  +  3t  )2 

+  £ - 3 - (  xn  +  3«  )  c“<x0  +  3{)  1 

(  XQ  +  3t  )  1  0 

<741j  =  3000 

«412  *  3000 
~  6000 

P52  I  'I  =  <  (Vo  V 1  <2  +xwo  ‘  +xw  1  <  +1 1  c-^wo  +  V 1  +  3‘  > 1 

=  < _ iVoVi — I(\VV0  +  xvv1  +  3<)3<2«-Ix»'0  +  xvvi  +  3<)' 

(x^g  +  xvki  3*  " 

+  <  .,.l>.0.t!'yiLo(Xtt,0  +  X„,,  +  3f  I  e-<xW0  +  Vl  +  3')' 

(XW'0  +  XU'l  +  3< 

+  ‘  IV|t;,,il|  <^«  +  V,  +  3.  )  +  W  3.)  • 

h2x  =  2  936 

?522  =  6  667 

?,0  =  6.667 
b~Z 

q52  =  16.297 


?72  (0  —  2  c  (X^g  1“ +X^q  t < +1)  c  ^  vv  0  +  wi  +  W 

=  € _ 1>°  V _ l(\W0  +  \Wl+3<  )3  t2  e-<Vo  +  V 1  +  3<)  < 

(xwo  +  XVVr  1  +  3<  )3  " 

+  €  t>ll-  (X^g  +  \WI  +  3c  )2  t  e-(Vo  +  Vl  +  3(> « 

(xwo  +  XVK1  +  3(  )“ 


yWl  +36  )e-(XV^0‘ 
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=  5.926 


—  13.333 

*2 


=  13.333 


^72  “  972j  +  ?722  +  ?723 


=  32.592 


P«3  M  ~  PS2  W 


Pg3  (0  —  P72 


Transition  kernel  element*  for  transition*  within  class  2: 


P54  (0 


=  0.9  X?  <  e_(xl +  2<)t 


,{  0.9  -  <  P-i&l  }  (  Xj  +  202  t  e'(Xl  +  20  * 


P54  =  09 

<54  =  72 


fl54  =  40 


p74(0  =  o.ixJu-<xi  +  2<)< 

%{  0.1  -  £  O-JXj  }  (  Xj  +  2£  )2  t  e”(xl  +  2f,f 


P74  —  0.1 

974  =  8 


a74  =  40 
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PM(<)-XF0^Vli  +  1)e"(V°  +  Vl+2<M 

^O-^L*  -*  12  X'F0  Xf-1  -  } 

<Vo  +  V  1  )3  <Vo  +  Vi)4 

I  (  Xf  o  +  Xr i  +  )3  /2  <f~<  Vo  +  Vl  +  2,)I 

\  2  4  x2 

+  ( — T-« - 

(Vo  +  Vl^  ^Vo  +  V  l  ^ 

(  XF0  +  V !  +  2<  )2  t  e“<  Vo  +  V  1  +  20  f 


p5&i  -  0  296 
P552  =  0  444 

PSS  =  Pi 5.  +  P55.  =  0  740 
1  • 

««,  -  »  852 
«M2  “  11852 
1r,s  =  «55,  +  ?55„  =  23.04 

°55j  *  20 

a55  =  13  333 


P«l')  =  4,'IVo  *  +  1  )«-<V°  + V‘  *=*»' 


{  2XF1  Vo  _f  r"  Vj  A/r°  } 
(Xr0  ♦  xf  1  )3  tXFO  +  V  1  ’* 


12  x; 


•»4  l3  /*  ^"IVn  +  XF  1  +  2<  I J 


4(  Vo  +  XFl  +  2<  )J  /-e''Vo^Fl 


+  { 


FI 


4  X 


— < 


Fl 


(XF0  +  Vl  * 


(xFq  +  xf\ 


~(Xr-n  +  Vl  +  3<'  1 


(  XF0  +  Xfi  +  2<  )-lc~''FOT  Fl 
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0.148 

P«52*  0-Hl 

P55  *  P(J51  +  P«52  *  0  259 
?Ml  =  5.W8 

*5,  "  2  963 

?fl5  *  «CSj  +  ?652  *  8  889 

“Mj  =  20 

««52  =  13  333 
P46  (*)  *  P(J5  M 

*7T(0  —  **oMVi,+ 1  ir('«*Vi*!l1' 

2  2 
2  Vo  Vi  12  VoV  1  } 

(XW0+XWl)3  <Vo  +  Vl  ) 


^  (  Vo H 

2 

Vo 

.  4  Wo 

1 +  Vi  *" 

<Vo  +  Vi 
(  Vo 

\3  /-  «._(xivn  +  Hvi  +  -£)  * 


(  Vn  + V,  +2<  )”  ^  C— '  H’O  T  IV*  1 


"(xivn  +  Vi  + 
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,'aM  . 


^*0.148 
P772  =  dll 


PS5  ~  ^77t  +  P772  “  0  250 


q77  =  5.926 

g  =  2.963 

2 


?77  “  ^77j  +  ^772  ~  8  889 


a77j  =  20 
a77r,  -  13  333 


X*  ,  f(XWn<+ll«“(Xw,0+Vl+2f)' 


P87  (0  “  XvVl  ^  ^W0 


2  2 
,  2  V]  Vp  12  XW  i  XW0  | 

(XVV'0  1  ^  (XWO+XWl) 


i(  \wn  +  4-  2t  )3  t 2  e',xwo+  xw  l  +  2f'f 


'wo  T  nw i 
2 

Vi  \ 

x  >3 


(XW0  “*■  XWl  *"  <XWO+XVV'l^ 

(  \W0  +  \Wl  +  2e  )2f<HXWO  +  XWl  +  2(>‘ 


Pg7i=  0.296 
p872  =  0.444 


P55  =  P87t  +  P»7,  =  0740 
^,  =  11852 

987.,  =  11  852 
*» 

9g7  =  9g7i  +  tfg?.,  =  23.(04 

aa.  =20 

8‘1 

afl-  =  13.333 

81 


P<8<')  =  Pg7  W 


p78  (0  =  P77  (0 

Transition  kernel  elements  for  transitions  from  class  2  to  class  3: 


fV,(*)  =  2<(V+  i)«~(xi  +  2t,‘ 

(\l  +  2<  fte-lh  +  W* 

(  +  2e  )2 

+  < _ 2 _ (  X,  +  2*  )  /  e~(xi  +  2t>  * 

(Xj  +  2*)'  1 

</Mi  =  2000 

?W2  =  2000 

«94  =  <hil  +  %A2  =  4000 

p9S  (0  *  2 < (xF0  xF I  t2  +xF0  i+xni+i)  r'Vo +  V 1 +  2{» £ 

= « — 4_  X/l°  X£J —  i  (xF0  +  xfl  +  2<)3  /2  r<Vo  +  V 1  +  2e)  ( 

<XF0+XF1  +  2‘  >32 

+  f  2  1  Xf°  ^  X/r !. 1  .  (XFQ  +  XF  t  +  2(  )2  t  r(xF0  +  XF  1  +  2(> f 
(xf0  +  XF  1  +  2< 

+  < - 1 - (XFn+Xn  +2t  )e-<xF0  +  xFl  +  2e)f 

<XF0  +  xF1  +  2<» 

^95j  = 

=  13.333 
=  13.333 

?95  =  «Wl  +  ?95.,  +  %3  =  32.593 
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P 97  (0  —  2  €  (X^q  t2  +XVV0  f  +X^ ^  t  +1)  e  <Vo  +  xw  i  +  “£) J 

=  <  — tlmhn —  i  (\wo  +  \WI  +  2<  )3 12  e-^wo +  Vi  ♦  2<) < 

(XIV0  +  +  2< ) 

+  €  Jlh™l  W.  (Xvvo  +  \WI  +  2<  )2  t  e-<V0  +  +  20 . 

(xwo  +  XWl  + 2t  )* 

+  ‘  <WW2‘) ( Vo  +  Vl  +  Jt '  '"|Xh'°  *  v  ‘  ' 

?97i  =  5.926 
?97.,  =  13  333 

m 

?«7  =  13.333 
w3 

?97  =  +  %2  +  ?Q73  =  32  593 

^98  M  =  ?97  (0 

Since  class  3  is  a  trapping  class  and  consists  of  only  one  state,  the  form  of  the 
tarnsition  kernel  is  not  important. 
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Appendix  D 

Stationary  Probability  Distribution  of  the 
Non-perturbed  Semi-Markov  Chain  in 
Class  2 


By  using  the  Eq.  (4.10),  the  mean  holding  time  for  each  transition  for  the 
non-perturbed  process  in  class  2  are  : 

fCj  =  -L  =  40 


54 


'1 


f74  =  f54  =  40 


f55  ” 


P55fl 


_  ‘55.  T  —  55, 

p55  1  p55 


=  18 


f65  ~ 


f46  = 


_  P®51 


fl65,  + 


P650 


“  a65  =17  143 


p65  1  p65 


f65=  17.143 


f5«  =  f55  =  l6 


fT7  =  r22  =  17.143 


V  -  f32  =  i6 


’a  =  '87  =  *« 


f._  =  r..  =  17  143 

<5  t  i 


So  the  meaning  holding  time  in  each  state  unconditioned  on  the  destination  is 


V 


calculated  by  Eq.  (4.0)  and  is  given  as  follows  : 


f5  =  16.296 
=  16.296 
f7  =  16.296 
f8  =  16.296 

Then,  the  mean  holding  time  of  the  non-perturbed  process,  as  defined  by  Eq.  (4.8), 
in  class  2  is 

f  =  17.601 

By  using  Eq.  (4.7),  the  stationary  probability  distribution  in  class  2  of  the  non- 
perturbed  process  is  found  to  be, 

=  0.12501 
=  0.68199 
;r6  =  0.17684 
jr?  =  0.00929 
ira  =  0.00689 
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